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Preface 


The IBM® Smart Analytics System is a fully-integrated and scalable data 
warehouse solution that combines software, server, and storage resources to 
offer optimal business intelligence and information management performance for 
enterprises. 

This IBM Redbooks® publication introduces the architecture and components of 
the IBM Smart Analytics System family. We describe the installation and 
configuration of the IBM Smart Analytics System and show how to manage the 
systems effectively to deliver an enterprise class service. 

This book explains the importance of integrating the IBM Smart Analytics System 
with the existing IT environment, as well as how to leverage investments in 
security, monitoring, and backup infrastructure. We discuss the monitoring tools 
for both operating systems and DB2®. Advance configuration, performance 
troubleshooting, and tuning techniques are also discussed. 

This book is targeted at the architects and specialists who need to know the 
concepts and the detailed instructions for a successful IBM Smart Analytics 
System implementation and operation. 
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IBM Smart Analytics System 


In this chapter we introduce the IBM Smart Analytics System, including the 
benefits offered. We describe the features and architecture of the IBM Smart 
Analytics System. 

We cover the following topics: 

► An overview of the IBM Smart Analytics System 

► The IBM Smart Analytics System portfolio 
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1.1 Overview 


Nowadays, enterprises recognize the value of business analytics and are moving 
to apply these capabilities to add business value. However, implementing a data 
warehouse solution requires resources and expertise in business intelligence 
software, server hardware, storage, and the help of professional services. The 
traditional system implementation method for this complex integration effort costs 
a company both in time and money. 

IBM Smart Analytics System, taking advantage of the appliance architecture, is a 
pre-integrated analytics system designed to deploy quickly and deliver fast time 
to value. Because the software is already installed and configured in the server, 
IBM clients are able to have their systems up and running in days instead of 
months. Engineered for the rapid deployment of a business-ready solution, the 
IBM Smart Analytics System includes the following features: 

► A powerful data warehouse foundation 

► Extensive analytic capabilities 

► A scalable environment that is integrated with IBM servers and storage 

► Set-up services and single point of support 

The IBM Smart Analytics System comes in a number of offerings. IBM 
professionals with expertise in data warehouse applications help you select the 
proper IBM Smart Analytics System based on the data and user capacity 
needed. To add capacity over time, you can mix various generations of hardware, 
enabling you to protect your investment in the long term. 

Every IBM Smart Analytics System offering offers a set of resources to support a 
complete data warehousing solution. At the heart of the IBM Smart Analytics 
System is a data warehouse based on DB2 Enterprise Server Edition software 
and the Database Partitioning Feature that incorporates best practices based on 
decades of IBM experience designing and implementing data warehouses. 

The analytics, workload management, and performance analysis capabilities 
provided by the InfoSphere Warehouse software depend on the specific edition 
of the software that your offering includes, but in most cases include the following 
features: 

► Data modeling and design provided through Design Studio 

► Data movement and transformation provided through the SQL Warehouse 
Tool 

► OLAP functions provided through Cubing Services 

► OLAP visualization provided through Alphablox 

► In-database data mining provided through Intelligent Miner and MiningBlox 
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► Data Mining Visualization provided through Intelligent Miner Visualization 

► Unstructured text analysis provided through Text Analytics 

► Integrated workload management provided through DB2 workload manager 

► Deep compression for data, index, temporary tables, and XML provided by 
the DB2 Storage Optimization feature 

► Performance tuning and analysis through DB2 Performance Expert 

The analytics capabilities provided by the optional IBM Cognos 8 Bl software 
include reporting, query, and dashboarding capabilities. These capabilities allow 
you to perform complex analysis on your data to identify trends in business 
performance, and represent your insight visually through reports or at-a-glance 
dashboards. 

An important advantage of the IBM Smart Analytics System offerings is that they 
are delivered to you fully set up and configured using customized information 
about your environment such as the IP addresses, user names, and the 
database name. Before handing the system over to you, IBM professionals verify 
the setup, perform final quality validation, and provide training. The system is 
then ready for you to begin creating database objects and loading data. The time 
between the purchase decision and the delivery of the system to your site, ready 
for you to begin loading data, can be as little as two weeks! 

Having the system built and configured by IBM not only speeds up the process, it 
ensures that every single piece of the system has been tested and verified for 
compatibility. This verification includes the operating system and software levels 
(including fix packs), firmware level of every hardware involved including 
switches, hard disk controllers, servers, and so on. 

This solution is running on the IBM reliable server hardware platforms. Because 
redundant hardware components are used, there is no single point of failure on 
any of the servers, storage controllers, hard disks, Ethernet switches, and SAN 
switches, network interface cards, internal networks, power supplies, and input 
power. An additional high availability configuration can help a system recover 
from other hardware and software failures using server failover. 

Another benefit for IBM Smart Analytics System customers is the single point of 
support. A single phone number is used for any support needed, be it for 
hardware or software. 

IBM can also speed up the development of database models for industry 
solutions in many areas such as retail, insurance, banking, telecommunications, 
health care insurance, and health care providers. 
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1.1.1 Architecture 


The IBM Smart Analytics System 5600, 7600, and 7700 are built upon a building 
block concept known as modules. Certain modules are mandatory and others 
are optional, depending on the quantity of data you have, your concurrency 
needs, and the analytics software you require. You can start from the required 
basic modules and add new modules when your business grows and the system 
requirement increases. 

Figure 1-1 illustrates the concept of IBM Smart Analytics System modules. 



Management module 

The management module is the starting point for all IBM Smart Analytics System 
offerings. The management module replaces the eliminated foundation module. 
This module provides the base functionality for all other modules. The basic 
management module contains one management node. 

The management node is a server used to automate the process of building the 
other servers at installation time. It also houses management software such as 
IBM DS Storage Manager and DB2 Performance Expert. In certain 
configurations, it hosts the IBM Smart Analytics System Control Console, which 
provides automated system-level maintenance capability. 
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User modules 

Each user module contains an administration node. Administration nodes, apart 
from the first one, are often called user nodes. The first administration node hosts 
a single database partition that stores the catalog tables for the core warehouse 
database, stores the non-partitioned data belonging to the core warehouse 
database, and acts as a coordinator for user connections. 

User nodes can store non-partitioned data and act as a coordinator for user 
connections, but unlike the first administration node, they do not hold catalog 
tables. These nodes are optional. They can act as additional DB2 coordinator 
nodes by helping to balance database connections. 

The first user module in an IBM Smart Analytics System is required because a 
configuration must have at least an administration node. The additional 
administration nodes are optional nodes that acts like an additional DB2 
coordinator node for balancing the database connections. For example, all the 
user connections can be routed to the user node allowing the administration 
node to focus on the applications requests. 

Data module 

As the name implies, the data module is where the partitioned data is stored. 
Every data module includes one data node that hosts multiple database 
partitions. An IBM Smart Analytics System must have at least one data module. 
Depending on the IBM Smart Analytics System, there are four or eight DB2 
database partitions per data module. 

Failover module 

The failover module is configured as a high availability module similar to the data 
module, but without storage disks. The failover module will standby and 
substitute for any failing administration, user, or data modules within its high 
availability group. Tivoli® Systems Automation for Multiplatforms constantly 
monitors the DB2 resources (hardware and software) and will substitute a failing 
module with the failover module to restore normal system operation. The failover 
process takes a few minutes to take place and all the uncommitted database 
operations are rolled back and will need to be resubmitted. Depending on the 
IBM Smart Analytics System server family, there is one failover module for each 
group of four or eight modules. 

Warehouse applications module 

The warehouse applications module is implemented using InfoSphere 
Warehouse software. Together with the business intelligence module, it provides 
analytics capability in an IBM Smart Analytics System. There can be a single 
warehouse applications module in an IBM Smart Analytics System. 
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A warehouse applications module can contain one or two nodes, the warehouse 
applications node is required, and the warehouse OLAP node is optional: 

► Warehouse applications node: 

The Warehouse applications node contains all of the InfoSphere Warehouse 
components that are in the application server tier of the InfoSphere 
Warehouse architecture. 

The software components include the following InfoSphere Warehouse 
software components: 

- InfoSphere Warehouse Administration Console 

- SQL Warehousing Tool (SQW) 

- Alphablox OLAP visualization tool 

- Miningblox application programing interface (to extend Alphablox 
components with data mining functionality). 

Certain components are hosted on the WebSphere Application Server 
software and are accessible through a web browser. The Cubing Services 
component can also execute in this node, if the optional OLAP node is not 
present. 

► Warehouse OLAP node: 

The warehouse OLAP node is an optional node for the warehouse 
applications module. It has two functions: 

- Executes the Cubing Services Cube Server in a high OLAP utilization 
scenario. 

- Allows for an active-active high availability (HA) configuration to be 
implemented. In this HA configuration, either node in the module can fail 
over to the other node. 

Business intelligence module 

The business intelligence module is implemented using Cognos 8 Business 
Intelligence software. It can contain two or more nodes where the maximum 
number depends on the specific offering. IBM Cognos Analytic Applications 
deliver the packaged reports and analysis for assessing performance of specific 
functional domains including finance, customer, supply chain and workforce. 

These applications help you gain insight, helping you to make better business 
decisions and perform faster and in a far more cost effective way in each 
business area. The business intelligence module is available in IBM Smart 
Analytics System 7700, 7600, and 5600. Cognos software is included with other 
offerings, but not as part of a business intelligence module. 
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Expanding the IBM Smart Analytics System 

The module architecture of the IBM Smart Analytics System provides flexibility in 
expanding your system as the business grows. As the customer database 
activities increase, new modules can be added: 

► Add data modules to the existing system when the data volume increases. 

► Extend your business intelligence module by adding more business 
intelligence extension nodes to manage to manage increased report users. 

► Add user modules to manage a large number of users and to balance the 
data accessing load. 

Figure 1-2 shows the IBM Smart Analytics System building block examples. 



1.2 IBM Smart Analytics System portfolio 

The IBM Smart Analytics System family offers a wide range of hardware 
platforms and architectures to provide customers with an optimal data 
warehousing system for their business size. From a small, all-in-one Linux or 
Windows powered server, to an AIX or mainframe enterprise solution, IBM Smart 
Analytics System is the perfect data warehouse solution. 
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1 .2.1 IBM Smart Analytics System 1 050 and 2050 


The IBM Smart Analytics System 1050 and 2050 are the entry level solutions 
that are intended for midsize businesses and departmental usage. 

The IBM Smart Analytics System 1050 is a single server system appropriate for 
database sizes ranging from 300 GB (using only internal disks) to 3.3 TB of data 
(using a dedicated storage controller). The sizes mentioned are for user space. 

The IBM Smart Analytics System 2050 is the next step up in the family. It is also 
a single server system that is designed for database sizes from 3.3 TB to 
13.2 TB. This system uses up to four dedicated storage controllers. 

Both systems employ IBM System x® servers (Intel® based) and can be 
installed on Novell SUSE Linux 1 1 or Windows Server 2008. The Cognos 
Business Intelligence is offered as an optional feature. 


1 .2.2 IBM Smart Analytics System 5600 

The IBM Smart Analytics System 5600 product family is built upon IBM System x 
hardware and uses SUSE Linux as the operating system. IBM Smart Analytics 
System 5600 is the IBM solution for medium to large companies that need 
powerful analytics capabilities and growth flexibility at an exceptional 
price-to-performance ratio. 

The IBM Smart Analytics System 5600 data modules, when configured using the 
standard 300 GB disks, can store 6 TB of user space. For an increased data 
density, 450 GB and 600 GB disks are available. 

The IBM Smart Analytics System 5600 has two offerings: 5600 VI and 5600 V2. 

IBM Smart Analytics System 5600 VI 

The IBM Smart Analytics System 5600 VI offering uses System x3650 M2 
servers and DS3400 storage. In the standard configuration for this offering, each 
x3650 M2 server is configured with one quad-core processor and 32 GB of 
memory. Each data node is attached to two DS3400 external storage servers 
with 300 GB disks, for a total of 12 TB of user space per data node. For 
increased data density, 450 GB and 600 GB disks are available. 
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Figure 1-3 depicts a common IBM Smart Analytics System 5600 VI layout. 



Figure 1-3 IBM Smart Analytics System 5600 VI layout 


The IBM Smart Analytics System 5600 VI with SSD option is a more powerful 
version of this offering. This option adds one additional quad-core processor, an 
additional 32 GB of memory, and 640 GB of Solid State Devices (SSD) to each 
administration, data, and standby node. With this option, the DS3400s use 
450 GB disks as standard, for a total of 9 TB of user space per data node. 

IBM Smart Analytics System 5600 V2 

The IBM Smart Analytics System 5600 V2 offering uses System x3650 M3 
servers and DS3524 storage. In the standard configuration for this offering, each 
x3650 M3 server is configured with one six-core processor and 64 GB of 
memory (except for the management node, which has only 32 GB of memory). 
Each data node is attached to two DS3524 external storage servers with 300 GB 
or 600 GB disks, for a total of 12 TB to 24 TB of user space per data node. 
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Figure 1-4 depicts a common IBM Smart Analytics System 5600 V2 layout. 
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Figure 1-4 IBM Smart Analytics System 5600 l /2 layout 


The IBM Smart Analytics System 5600 V2 with SSD option is a more powerful 
version of this offering. This option adds one additional six-core processor, an 
additional 64 GB of memory, and 640 GB of Solid State Devices (SSD) to each 
administration, data, and standby node. 

1 .2.3 IBM Smart Analytics System 7600 and 7700 

The IBM Smart Analytics System 7600 and 7700 offerings are designed for 
enterprise-wide reporting and analytics purpose. Members of this family use IBM 
POWER processors for mission critical performance and reliability, supporting 
complex workloads for large number of concurrent users. 

The IBM Smart Analytics System 7600 utilizes the POWER6 processors, 
whereas the new Smart Analytics 7700 are POWER7 processor-based systems. 

The IBM Smart Analytics 7600 has 4 TB of user space per data module. The IBM 
Smart Analytics 7700 systems can store 28 TB of user storage per data module 
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when using 300 GB hard disks, and 56 TB of user storage per data module when 
using 600GB disks. Another benefit of the IBM Smart Analytics 7700 is that (as 
standard) it comes with 800 GB of Solid State Devices per data module, and 
provides an optional expansion to 4.8 TB. 

Figure 1-5 illustrates the IBM Smart Analytics System 7600 layout. Each data 
module is allocated two EXP5000 disk enclosures. The administration node also 
houses an extra spare disk expansion drawer that houses hot spare drives. 



Figure 1-5 IBM Smart Analytics System 7600 layout 


The IBM Smart Analytics System 7700 has a unique configuration, compared to 
the IBM Smart Analytics System 7600. Each 7700 data node has eight DB2 
database partitions and is allocated four DS3524 storage servers. Each 
database partition is allocated half of a DS3524. 
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Figure 1-6 shows the IBM Smart Analytics System 7700 layout. 



Figure 1-6 IBM Smart Analytics System 7700 layout 


The IBM Smart Analytics System 7700 also has a unique configuration for the 
administration module. The administration node is allocated one DS3524 storage 
server, where half of the space is used for a single database partition that stores 
non-partitioned data and acts as a coordinator for user connections, and the 
other half is used for GPFS-shared directories. 
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1 .2.4 IBM Smart Analytics System 9600 


The IBM Smart Analytics System 9600 combines the availability and security of 
System z® with the characteristics of an appliance that leverages analytic 
information to the mainframe business. 

In this book, we focus on the IBM Smart Analytics System offerings 5600, 7600, 
7700 which are all based on IBM DB2 for Linux, UNIX, and Windows. For more 
information about the IBM Smart Analytics System 9600, go to these web 
addresses: 

http://www.ibm.com/software/data/infosphere/smart-analytics-system/ 

http://www-01.ibm.com/software/data/infosphere/warehouse-z/ 


1 .2.5 IBM Smart Analytics System family summary 

Table 1-1 summarizes the key features of the models in the IBM Smart Analytics 
System family. 

Table 1-1 IBM Smart Analytics family comparison 


DB2 database 
partitions per 
data node 


700 GB, 
4.2 TB 
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1.3 IBM training 


Available from IBM training are the newest offerings to support your training 
needs, enhance your skills, and boost your success with IBM software. IBM 
offers a complete portfolio of training options including traditional classroom, 
private onsite, and eLearning courses. Many of our classroom courses are part 
of the IBM “Guaranteed to run program,” ensuring that your course will never be 
canceled. We have a robust eLearning portfolio including Instructor-Led Online 
(ILO) courses, Self Paced Virtual courses (SPVC), and traditional Web Based 
Training (WBT) courses. A perfect complement to classroom training, our 
eLearning portfolio offers something for every need and every budget; simply 
select the style that suits you. 

Be sure to take advantage of our custom training plans to map your path to 
acquiring skills. Enjoy further savings when you purchase training at a discount 
with an IBM Education Pack, online account, which is a flexible and convenient 
way to pay, track, and manage your education expenses online. 

The key education resources listed in Table 1-2 have been updated to reflect the 
IBM Smart Analytics System. Check your local Information Management Training 
website or chat with your training representative for the most recent training 
schedule. 


Table 1-2 InfoSphere Warehouse courses 


Course title 

Classroom 

Instructor-Led 

Online 

Self Paced 

Virtual 

Classroom 

Web Based 
Training 

InfoSphere Warehouse 9 
Components 

DW352 

3W352 

2W352 

1W352 

InfoSphere Warehouse 9 - SQL 
Warehouse Tool and 
Administration Console 

DWA52 

3WA52 

2WA52 

1WA52 

InfoSphere Warehouse 9 - 
Cubing Service 

DWB52 

3WB52 

2WB52 

1WB52 

InfoSphere Warehouse 9 - Data 
Mining and Unstructured Text 
Analysis 

DWC52 

3WC52 

2WC52 

1WC52 

Introduction to TSA in IBM 
Smart Analytics Systems 

DW040 

3W040 


1W040 

Advanced TSA within an IBM 
Smart Analytics System 

DW331 

3W331 

2W331 

1W331 
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Descriptions of courses for IT professionals and managers are available at: 
http : //www. i bm.com/servi ces/1 earni ng/i tes . wss/tp/en?pageType=tp_search 

Visit http://www.ibm.com/training or call IBM training at 800-IBM-TEACH 
(426-8322) for scheduling and enrollment. 


1.3.1 IBM Professional Certification 

Information Management Professional Certification is a business solution for 
skilled IT professionals to demonstrate their expertise to the world. Certification 
validates skills and demonstrates proficiency with the most recent IBM 
technology and solutions. 

1.3.2 Information Management Software Services 

When implementing an Information Management solution, it is critical to have an 
experienced team involved to ensure that you achieve the results you want 
through a proven, low risk delivery approach. The Information Management 
Software Services team has the capabilities to meet your needs, and is ready to 
deliver your Information Management solution in an efficient and cost effective 
manner to accelerate your Return On Investment. 

The Information Management Software Services team offers a broad range of 
planning, custom education, design engineering, implementation and solution 
support services. Our consultants have deep technical knowledge, industry 
skills, and delivery experience from thousands of engagements worldwide. 

With each engagement, our objective is to provide you with a reduced risk, and 
expedient means of achieving your project goals. Through repeatable services 
offerings, capabilities, and best practices leveraging our proven methodologies 
for delivery, our team has been able to achieve these objectives and has 
demonstrated repeated success on a global basis. 
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The key Services resources listed in Table 1-3 are available for InfoSphere 
Warehouse. 


Table 1-3 I InfoSphere Warehouse services 


Information Management Services 
offering 

Short description 

IBM Smart Analytics System Services 

Foundation Services as part of the turn-key hardware, software 
solution; the goal is to deliver a Data Warehouse in a table ready 
state. 

InfoSphere Warehouse Data Mining 

This is a rapid deployment services for Data Mining focusing on 
a specific business case and limited sources of data for existing 
InfoSphere Warehouse customers. 

InfoSphere Warehouse Data Migration 

The objective is to migrate the data from an existing DB2 
warehouse to a new InfoSphere warehouse (that is, the IBM 
Smart Analytics System) leveraging the IBM Data Movement 
Tool (IM Lab Services Asset). 

Capacity Planning for an existing Data 
Warehouse 

The objective is to evaluate the current DB2 Warehouse 
environment, to understand the new goals and to provide 
guidelines for updated design, hardware, and software 
configuration to meet those goals. 

InfoSphere Warehouse HealthCheck 

This service includes a complete review of the database 
configuration, the operating system, the storage subsystem, 
and operational considerations. 

Data Warehouse Performance 
Optimization 

This service is in response to a specific performance problem. 
The scope is limited to the analysis of maximum three load 
process or maximum five slow queries. 


For more information, visit our website: 
http://www.ibm.com/software/data/services 
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1.3.3 IBM Software Accelerated Value Program 


The IBM Software Accelerated Value program provides support assistance for 
issues that fall outside normal “break-fix” parameters addressed by the standard 
IBM support contract, offering customers a proactive approach to support 
management and issue resolution assistance through assigned senior IBM 
support experts who know your software and understand your business needs. 
Benefits of the Accelerated Value Program include: 

► Priority access to assistance and information 

► Assigned support resources 

► Fewer issues and faster issue resolution times 

► Improved availability of mission-critical systems 

► Problem avoidance through managed planning 

► Quicker deployments 

► Optimized use of in-house support staff 

To learn more about IBM Software Accelerated Value Program, visit our website: 
http://www.ibm.com/software/data/support/acceleratedval ue/ 

To talk to an expert, contact your local Accelerated Value Sales Representative 
at this website: 

http : //www. i bm.com/software/support/accel eratedval ue/contactus . html 


1.3.4 Protect your software investment: Ensure that you renew your 
Software Subscription and Support 

Complementing your software purchases, Software Subscription and Support 
gets you access to our world-class support community and product upgrades, 
with every new license. Extend the value of your software solutions and 
transform your business to be smarter, more innovative, and cost-effective when 
you renew your Software Subscription and Support. Staying on top of on-time 
renewals ensures that you maintain uninterrupted access to innovative solutions 
that can make a real difference to your company's bottom line. 

To learn more, visit: 

http://www.ibm.com/software/data/support/subscriptionandsupport 
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Installation and 
configuration 


In this chapter we describe the installation, configuration, and deployment 
processes of an IBM Smart Analytics System. We provide a brief explanation of 
the planning process, highlighting the details to be considered before IBM builds 
an IBM Smart Analytics System. We also describe the installation process 
conducted at the IBM Customer Solution Center and at the customer’s data 
center. 


© Copyright IBM Corp. 201 1 . All rights reserved. 
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2.1 Planning 


The IBM Smart Analytics System offerings are pre-integrated systems with the 
installation and configuration conducted at the IBM Customer Solution Center 
(CSC) based on the information collected from the customer with the assistance 
of IBM specialists. The system is then shipped and deployed to the customer 
site. 


Most of the information required for building the system is collected and kept in 
the IBM Smart Analytics System customer worksheet. In addition, IBM and 
customer architects will design a floor diagram that describes server placement 
and data center environment requirements. 

In this section, we briefly describe the information collected for building an IBM 
Smart Analytics System. We highlight the details that should be considered 
during this planning stage to ensure a smooth deployment. 

2.1 .1 Smart Analytics System customer worksheet 

The customer worksheet provides the baseline for the IBM Smart Analytics 
System build and deployment. Each IBM Smart Analytics System model has a 
customer worksheet designed specifically for that model. Though the information 
collected can be similar, the worksheet layouts vary. 


Offerings: This book focuses only on the IBM Smart Analytics System 5600, 
7600, and 7700 offerings. These offerings are all based on IBM DB2 for Linux, 
UNIX, and Windows. 


The information required for building an Smart Analytics System can be briefly 
categorized into four main groups: 

► Server information 

► Network information 

► Database and operating system configuration information 

► Data center and system delivery information 

Server information 

The customer worksheet specifies the information about all management, 
administration, data, standby, warehouse applications, and business intelligence 
nodes in the system. 


20 IBM Smart Analytics System 




Figure 2-1 shows an example of the IBM Smart Analytics System component 
configuration section for IBM Smart Analytics System 7700. 


IBM Smart Analytics System 7700 R1.0 Customer Worksheet 

Version 1.3.4, Last Updated: Sept 12, 2010 6:00PM 

IBM and customer's confidential when completed. This worksheet should be filled before placing an order for IBM Smart Analytics System 

The information supplied in this document will be used by IBM Manufacturing and Customer Solution Center to install and configure software in IBM Smart Ai 

IBM will not be able to start installing your system until all required information in this worksheet are filled. 

Please enter information in required green and optional light blue fields below 

Green fields contain the minimum information required by IBM to install and pre-configure your system in our Manufacturing and Customer Solution Centei 
Ught blue are optional fields, not required by IBM to successfully pre-configured your system in our facility. However, they may be needed to complete the se 
Important: Please review and ensure that you entered all information correctly. An incorrect information in this worksheet could mean that software and open 
may have to be reinstalled and reconfigured at your additional cost 


Pieces of equipment 


SAN40B Switches 
1 GbE Network Switches 
to GbE Network SwitchesT 


Please specify DS3500 drive sit 
] Do you have LAN-free backup HBA Adapt 

le: All software components will be installed perthe IBM Smart Analytics System 


Figure 2-1 Components information listed on customer worksheet for IBM Smart Analytics System 7700 


Network information 

The network information section specifies the corporate network used by 
administrators, end users, and client applications that will access the IBM Smart 
Analytics System. The required information includes IP addresses, host names, 
gateway, and dynamic name server (DNS) information. 

There are various types of networks used in the IBM Smart Analytics System 
offerings that can be categorized as follows: 

► Internal networks such as the DB2 Fast Communications Manager (FCM) 
network and, on AIX only, the Hardware Management Console (HMC) 
network 

► Public networks such as customer corporate and application networks 
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The IP addresses should also be planned for future expansion; remember that 
IBM Smart Analytics System has the capacity to grow in a modular basis, to 
scale-out the environment. The network information must be prepared to 
accommodate the new modules. 

Figure 2-2 shows an example of the customer worksheet fields for host names, 
IP addresses, DNS, and gateway information listed on a customer worksheet for 
IBM Smart Analytics System 7700. 



Figure 2-2 The network information listed on customer worksheet 


Database and operating system configuration information 

Database and operating system configuration information collected includes the 
operating system users, groups, user IDs, and groups IDs. The DB2 instance 
and database information, with details about IBM InfoSphere Warehouse users 
and groups, is also required because the system deployment will install and 
configure a DB2 instance and will create the customer database. 
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UIDs and GIDs: Be sure to provide the user identification numbers (UIDs) and 
group identification numbers (GIDs) that are not being used in the existing 
enterprise to avoid the conflict when the IBM Smart Analytics System is 
deployed and allow a seamless integration of the IBM Smart Analytics System 
with the existing customer environment. 


Figure 2-3 shows an example of the customer worksheet section for the user and 
group information for an IBM Smart Analytics System 7700. 



Figure 2-3 Example of user and group information listed on customer worksheet 


Data center and system delivery information 

The data center and delivery information section specifies the shipping 
information and data center details for deploying the system, such as customer 
data center cable racks position (overhead or underfloor racks). 
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Figure 2-4 shows an example of data center and delivery information section for 
IBM Smart Analytics System 7700. 


STEP 8: Data Center and Delivery Information 

Power Supply ar 


Where do fibre cables 


Cable Management 

your data center? 

supply cords run in your data center? 


I Tii-;?:! 6:c f 


Customer name: 


Delivery address: 

(Street Address. City. State, Zip 
code. Country) 


Primary contact: 


Primary contact telephone : 


Installation contact: 


Installation contact telephone: 


Alternate contact: 


Alternate contact telephone: 



Delivery access: 
Appointment required: 
Secured facility: 
Loading dock: 

Truck size options: 


gate required: 

Elevator: 

If there is an elevator ; has th 


□ 18 ft □Rat Bed (Optional) 


m weight per rack is approximately 1800 lbs. 


Figure 2-4 Data center and delivery information listed on customer worksheet 


Hardware Management Console and the Remote Support 
Manager 

The Hardware Management Console (HMC) is a required component of IBM 
Smart Analytics System 7600 and 7700 (AIX based) and is supported on these 
two configuration only. The Remote Support Manager for Storage is a required 
component of 7600 and 7700, and is optional for the 5600 offering. 

The IBM Smart Analytics System can take the advantage of features such as the 
call home support of the Hardware Management Console (HMC) and the 
Remote Support Manager (RSM) for IBM Storage Systems. The information 
required to implement these components should be provided prior to the system 
deployment. When the system is deployed at the customer site, it will be ready to 
take proactive actions. 
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Figure 2-5 illustrates an example of the HMC optional settings for IBM Smart 
Analytics System 7700 configurations. 


Preinstallation configuration worksheet for the HMC 

You may optionally complete this worksheet so that IBM can complete call-home connectivity for the Hardwarehouse Management Consc 
If the information in this worksheet is not completed. IBM may not be able to complete call-home and notification configuration on the HMI 

More information about planning, installing and configuring the HMC can be found at: 
http //publib boulder ibm com/infocenter/powersys/v3r1m5/topic/iphai/iphai.pdf 

Local host information 

Some information is pre-filled based on information you provided in the "Customer Worksheet" tab. You may provide additional informatioi 






HMC hostname: 

smashmcOI 

smashmc02 

N/A 

N/A 

Domain name: 

<enter info here> 

<enter info here> 

N/A 


Description of HMC: 





Gateway address (nnn.nnn.nnn.nnn) 

9.10.23.1 

9.10.23.1 



Gateway device: 






Do you want to use DNS? (ves/no) 





If "yes" specify DNS Server Search Order below 










2 





Domain suffix search order: 

1| 1 1 1 

i 1 1 1 


ocal Host Information 

To identify your Hardware Management Console (HMC) 
fully qualified host name. Domain name example: name 
Gateway Information 

To define a default gateway, fill in the TCP/IP address tc 
is not located on the same subnet as the source. 

DNS Enablement 

\ Customer Worksheet A Optional HMC W 


to the network, enter the HMC's host name and domain name. 
.yourcompany.com 

) be used for routing IP packets. The gateway address informs e 


i computer 


Figure 2-5 HMC information listed on Customer Worksheet for IBM Smart Analytics System 7700 


2.1.2 Floor diagram and specification review 

As part of the installation planning, the IBM team works with the customer to 
build a rack diagram with all IBM Smart Analytics System components and 
prepare an environment worksheet that specifies the environment requests such 
as the floor space, power, and cooling needed for the IBM Smart Analytics 
System components. 

When the system is delivered, the physical space should be available with all the 
environment requirements fulfilled for the deployment process. 
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2.2 Installation of IBM Smart Analytics System 


The installation service is part of the IBM Smart Analytics System contract 
package and it is governed by an IBM Statement of Work. After the order is 
placed and all information needed is provided on the customer worksheet, the 
IBM Smart Analytics System Services assembling process is ready to start. This 
process can be divided into two main activities: 

► IBM CSC assembles, installs, configures, and tests the system. The 
integrated system is then shipped to the customer’s data center. 

► At the customer’s data center, the racks are reconnected and tested to ensure 
that the system meets the specifications. 


2.2.1 Installation at the IBM Customer Solution Center 

The customer worksheet generates the input files needed by the automatic 
installer to properly deploy the custom IBM Smart Analytics System ordered. 

The following main activities are performed at the IBM facility: 

► Cabling, installing, and testing following the IBM Smart Analytics System 
specification: 

The IBM specialists will prepare the servers in the rack, do the initial cabling, 
and install the base software for the system. 

► Storage subsystem configuration: 

The IBM Storage subsystem will be configured following the IBM Smart 
Analytics System practices. Also it will be loaded with the validated software 
and microcode stack. 

► System optimization: 

The IBM specialists will bring the system to the IBM Smart Analytics System 
standard configuration. The system settings, such as operating system and 
storage parameters, will then be applied. 

► Database configuration: 

Following the information given by the customer, the customer database will 
be created. This base setup follows the IBM Smart Analytics System best 
practices. 

► Quality assurance: 

When the system is ready, it will be thoroughly tested and checked to see if it 
meets the IBM Smart Analytics System standards. 
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► Installation report: 

At the end of the installation and configuration, a report will be generated with 
all the information gathered during the installation process. The report also 
documents all the system information, from the architecture overview to the 
network point-to-point diagram. This report will be updated after the 
deployment process at the customer site before the system is turned over to 
the customer. The installation report should be used as a reference when 
contacting the IBM Customer Support for any questions regarding the 
system. 

► Packing and shipping to the customer: 

When the system is properly installed, tested, and documented, it is ready to 
be shipped to the customer site. 


2.2.2 Installation at the customer site 

Before an IBM Smart Analytics System is ordered, IBM has worked with the 
customer to evaluate the data center requirements for the system. When the 
system arrives at the customer site, the data center should be ready for 
deploying the servers. IBM will coordinate with the customer to perform this final 
installation task. 

The main activities are as follows: 

► Power up the systems: 

This is the systems startup. At this point, IBM specialists check the health of 
the IBM Smart Analytics System components. 

► Perform internal system cabling: 

The internal cabling is reconnected after the servers are powered up and 
tested by IBM specialists. 

► Rerun the installation and performance tests to ensure the system is 
performing as expected: 

The overall system is thoroughly tested again to ensure the environment is 
performing as expected. 

► Complete the final quality assurance check: 

After the test results are gathered, the final checklist is updated. 
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► Perform mentoring and knowledge transfer session: 

A skill transfer session is conducted by the IBM professionals for customers 
about the IBM Smart Analytics System. The information provided in the 
session includes the system overview, the installation report, how to deal with 
the new environment, and where to gather further information. 

► Update the installation report: 

The final update is done at the installation report with all documented results 
and system information. The installation report is given to the customers 
when the system is turned over to them. 

At this point the system is at the “ready-to-load data” state. 


2.3 Documentation and support for the IBM Smart 
Analytics System 

The IBM Smart Analytics System product documentation is a good starting point 
to learn more about the new system. You can download the documentation from 
this web address: 

https : //wwwl4 . software . i bm . com/webapp/ i wm/web/preLogi n . do? 1 ang=en_US&source=i dw 
bcu 

The IBM Smart Analytics System documentation provides the following 
information: 

► Managing users and passwords: 

When the system is delivered it comes with the default passwords. These 
password should be changed according to your IT regulations. 

► Table space design considerations: 

Look at this topic to know more about how to create table spaces. To ease the 
system administration tasks, IBM Smart Analytics System takes advantage of 
DB2 Automatic Storage feature to manage database space automatically. 

► DB2 workload manager (WLM): 

It is extremely important to implement WLM to protect the system from 
overload, rogue queries, and to ensure the SLAs requirements are met. To 
manage the system workload, IBM Smart Analytics System explores the DB2 
WLM feature. 
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IBM Smart Analytics System installation report 

During the installation and deployment of the IBM Smart Analytics System, many 
details are generated about the system architecture, servers, network, operating 
system, software stack, performance tests, and so on. All this information is 
collected and documented in a worksheet called IBM Smart Analytics System 
installation report. 

On the installation report, you can find information such as this: 

► Architecture and hardware profile 

► Rack diagram 

► Software stack 

► Networking configuration 

► Storage configuration 

► High Availability configuration 

► Network point-to-point 

► Fiber point-to-point 

This report is delivered by IBM when the system is turned over to the customer. 
You should update this installation report with system changes performed on the 
system, for example, database parameters settings, if changed. This report also 
can be used as a reference and be updated by the IBM Smart Analytics System 
Health Check services. 

IBM Smart Analytics System installation report provides a single view of the 
entire system stack (hardware, software and configuration). This report will be 
useful and needed in case of opening a Problem Management Report (PMR), for 
example. 

When opening a PMR, have as much information as you can about the situation 
and the environment. For example, to report a disk problem, you will need to 
inform the Type, Model and S/N of the storage system and disk enclosure, for 
example. This information can be found on the installation report. If RSM is 
enabled, it can handle all these, including opening the PMR. 

If a copy of the installation report is needed by the customer, contact IBM Smart 
Analytics Customer Support for a copy. 

For the most update information about IBM Smart Analytics System, see the IBM 
website at this address: 

http://www.ibm.com/software/data/infosphere/smart-analytics-system/ 
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High availability 


The IBM Smart Analytics System offerings each include high availability 
solutions that automate failover from any active node to another node in the 
cluster. Together with the many redundant hardware components in the system, 
these high availability features can minimize the down time caused by many 
hardware and software problems. 

In this chapter we describe the high availability characteristics present on the 
IBM Smart Analytics System. 
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3.1 High availability on IBM Smart Analytics System 


The IBM Smart Analytics System offers the high availability capabilities through 
both the hardware and software features. The IBM Smart Analytics System is 
designed with redundant hardware components to help minimizing the downtime 
through single points of failure with any one hardware component. The following 
components are designed with redundancy: 

► Disk mirroring for internal storage (mirrored volume groups) 

► RAID disk arrays for external storage 

► Dual port Fibre Channel adapters 

► Redundant SAN switches 

► Dual-port network adapters 

► Redundant network switches 

► Dual active RAID controllers 

► Dual hot-swappable power/cooling units 

The IBM Smart Analytics System utilizes IBM Tivoli System Automation for 
Multiplatforms (SA MP) to provide or extend the high availability features at the 
software or application level. IBM Tivoli SA MP integration provides the capability 
to take specific actions when a detectable resource failure occurs. This action 
can be as simple as restarting a software components or moving a software 
components to the standby node. A resource failure can include: 

► Network failure 

► A server failure caused by accidentally rebooting or power failure 

► DB2 instance crash 

► Database partitions failure 

There is a separate Tivoli System Automation for Multiplatforms high availability 
configuration for each of the following IBM server groups: 

► Core warehouse servers that host the DB2 database partitions on which the 
data resides 

► The warehouse applications modules that host the IBM InfoSphere 
Warehouse application 

► Business intelligence (Bl) module that host the IBM Cognos components 

To learn more about managing high availability for the IBM Smart Analytics 
System, see this IBM training course: 

https://www-304.ibm.com/jct03001c/services/learning/ites.wss/us/en7page 
Type=course_descri pti on&courseCode=DW330 
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3.2 IBM Tivoli System Automation for Multiplatforms on 
IBM Smart Analytics System 

IBM Tivoli SA MP manages and provides automated recovery for the IBM Smart 
Analytics System components that require high availability. The infrastructure of 
Tivoli System Automation for Multiplatforms is based on the Reliable Scalable 
Cluster Technology (RSCT), which is an IBM software product that provides a 
highly available and scalable clustering environment for applications and 
businesses running on AIX and Linux platforms. Tivoli SA MP allows you to 
configure high availability systems through the use of policies that define the 
relationships among the various components. After the relationships are 
established, Tivoli SA MP assumes responsibility for managing the resources on 
the specified nodes as configured in the policies. When a resource failed, Tivoli 
System Automation for Multiplatforms can quickly and consistently perform a 
restart either on the same server or on the standby server. 

The relationships among the resources managed by Tivoli SA MP are controlled 
cluster-wide. If one application needs to be moved from one server to other, Tivoli 
System Automation for Multiplatforms automatically handles the start and stop 
sequences, node requirements, dependencies, and any further follow-on actions. 
You can group the resources managed by Tivoli System Automation for 
Multiplatforms to establish the relationships among the members of the group as 
a location or start/stop relationship. When grouped, the operation against the 
resources can be referenced to the resource group as a single entry, and is 
applied to the entire group. 

In a Tivoli SA MP configuration, a set of nodes in the system, commonly called a 
cluster, is referred as a peer domain. All nodes in a peer domain continually send 
and receive heartbeats over communication groups. A communication group is a 
set of nodes that can talk to each other over a common communication medium. 
An example of a communication group would be network interface cards residing 
on various nodes connected to the same network. 

Tivoli SA MP terminology frequently seen in the IBM Smart Analytics System 
configuration: 

► Peer domain: A peer domain or cluster is a group of host systems where the 
Tivoli System Automation for Multiplatforms managed resources reside. A 
peer domain can consist of one or more systems or nodes. 

► Resource: A resource is any piece of hardware or software that can be 
defined for Tivoli System Automation for Multiplatforms to manage, for 
example, a network interface card or a DB2 database partition. 

► Resource group: A resource group is a set or collection of resources. 
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► Relationships: It defines the relationships between the resources within a 
cluster. There are two types of relationships: 

- Start-stop relationship: This relationship defines the start and stop 
dependencies between resources. 

- Location relationship: This relationship is used when resources must, 
if possible, be started on the same or another node in the cluster. 

► Equivalency: An equivalency is a set of resources that provide the same 
functionality. Tivoli System Automation for Multiplatforms can select any 
resource in the equivalency to provide an operation. On the IBM Smart 
Analytics System, the network adapters that require high availability have 
equivalencies, for example, the DB2 Fast Communications Manager network 


For more information about Tivoli System Automation for Multiplatforms, see the 

IBM Tivoli System Automation for Multiplatforms manual, Administrator’s and 

User’s Guide, SC33-841 5-01 , at the following web address: 

http : //publ ib. boulder. ibm.com/tivi dd/td/ I BMTivol iSystemAutomationforMul 

tiplatforms3.1.html 


3.3 High availability overview for the core warehouse 
servers 


An active-passive configuration can have a standby node for one or more active 
nodes. Figure 3-1 shows a high availability group configuration with two active 
nodes. 


Figure 3-1 Two active nodes high availability group configuration 

The IBM Smart Analytics System core warehouse servers have this type of 
active-passive high availability configuration. For a number of administration and 
data nodes, there is a standby node to receive the managed resources from a 
failed server. 


adapters. 


Active node 


Active node 



Standby node 
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The core warehouse can have one or more high availability group. Each high 
availability group has a number of administration nodes and data nodes, storage 
servers (IBM DS), SAN switches, and one standby node. When a server failure 
occurs, its storage and workload are moved to the standby node of the high 
availability group. Only one failure is supported and enforced per high availability 
group. 

In a high availability group, if a node is failed over to the standby node, the 
standby node cannot fail over to another node within the same high availability 
group nor to the node on other high availability group. 

The production instances or the core warehouse nodes of the IBM Smart 
Analytics System 5600 has the following high availability group configuration: 

► Maximum of five active nodes (administration, user, or data) 

► Ten storage server (either DS3400 or DS3524) 

► One standby node 

► Two SAN switches (redundant pair) 

For the IBM Smart Analytics System 5600, one high availability group has one 
standby node for one to five administration, user, or data nodes. If a sixth node is 
needed, then a new high availability group is initiated, thus, requiring a second 
standby node. This new standby node will manage the next set of five nodes. 
That is, for the IBM Smart Analytics System 5600, each group of five nodes 
always has its own standby node. 

Figure 3-2 illustrates the high availability group for the IBM Smart Analytics 
System 5600 VI . 



Figure 3-2 Example of a high availability group for IBM Smart Analytics System 5600 
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The core warehouse nodes of the IBM Smart Analytics System 7600 has the 
following high availability group configuration: 

► Maximum of eight active nodes (administration, user, or data) 

► Two DS5300 storage server 

► One standby node 

► Two SAN switches (redundant pair) 

For the IBM Smart Analytics System 7600, each high availability group always 
has one standby node for one to eight administration, user, or data nodes. If a 
ninth node is needed, then a new high availability group is initiated, thus, 
requiring a second stand by node. This new stand by node will manage the next 
set of eight nodes. That is, for the IBM Smart Analytics System 7600, each group 
of eight nodes always has its own standby node. 

Figure 3-3 illustrates the high availability group for the IBM Smart Analytics 7600. 



Figure 3-3 Example of a high availability group for IBM Smart Analytics System 7600 


The core warehouse nodes of the IBM Smart Analytics System 7700 has the 
following high availability group configuration: 

► A maximum of four active nodes (administration, user, or data) 

► 1 3 DS3524 storage server for one administration node and three data nodes, 
or 1 6 DS3524 storage server for four data nodes 

► One standby node 

► Two SAN switches (redundant pair) 
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For the IBM Smart Analytics System 7700, each high availability group has one 
standby node for one to four administration, user, or data nodes. If a fifth node is 
needed, then a new high availability group is initiated, thus, requiring a second 
standby node. This new standby node manages the next set of four nodes. That 
is, for the IBM Smart Analytics System 7700, each group of four nodes always 
has its own standby node. 

Figure 3-4 illustrates the high availability group for the IBM Smart Analytics 7700. 



Figure 3-4 Example of a high availability group for IBM Smart Analytics System 7700 

For a high availability (HA) group, when a failure is detected, all managed 
resources is automatically moved from the failing node to a standby node of the 
same high availability group by Tivoli System Automation for Multiplatforms. 

The standby node has the same software stack and code level of the other 
nodes. The standby node is always powered up and ready to assume the 
resources from a failed node. One HA group is connected to a redundant pair of 
SAN switches so the standby node is able to access the storage for the 
remaining nodes. When a standby node takes the managed resources, it is able 
to access the storage resources and start the DB2 database partitions. 

To connect to services in an IBM Smart Analytics System environment, clients of 
those services refer to and use specified host names or IP addresses to access 
the services. The IBM Smart Analytics System environment utilizes Service IPs 
that are always associated with those services regardless of which server hosts 
those services. The service IPs are managed as Tivoli SA MP resources. 


Chapter 3. High availability 37 


To allow Tivoli SA MP to manage the resources, each resource in the IBM Smart 
Analytics System must be grouped to at least one resource group. All resources 
are organized into a hierarchy of resource groups. The top resource group in the 
hierarchy is at the node level. Below this level, resources are grouped based on 
their dependencies and requirements, as follows: 

► The resources required by the DB2 database partition are grouped in a 
partition-level resource group. 

► A volume group and its file systems is defined as a resource group. 

► All resources hosted under the same node are also grouped as a resource 
group. 

Typically, each DB2 database partition has a resource group associated with it, 
and each resource group contains all required resources: 

► DB2 database partition resource: For DB2 instance 

► File systems resources: For database directory, logs, and table space 
containers 

► Service IPs: For virtual IPs moved from one primary node to the standby node 

For example, for the IBM Smart Analytics System 7700 administration node, the 
node-level partition group has the DB2 coordinator partition and the volume 
group resource group of the file systems associated with the coordinator 
partition. For the first data node, the node-level resource group has the eight 
database partitions it hosts and the volume group resource group of the table 
space containers. 

The Tivoli SA MP resources for the network interfaces, are automatically defined 
when the resource domain is created. On the IBM Smart Analytics System, the 
domain for the core warehouse is named bcudomain by default. After being 
defined, the resources are grouped in equivalencies. On top of the equivalences, 
the dependency relationship is created between the DB2 database partitions and 
the high availability networks. The network resource is automatically started 
during the operating system boot. 

In addition to the network interface resources, the DB2 instance home directory 
is also defined as an equivalency. For the IBM Smart Analytics System 7700 and 
7600, this directory is a General Parallel File System (GPFS) and has a 
dependency relationship with the database partition resources. This dependency 
relationship prevents the resources from being started if the home directory is not 
available. 
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On the IBM Smart Analytics System 5600, there is a NFS services related 
resource group for the DB2 instance home directory. There are also dependency 
relationships defined between DB2 database partition resources and NFS server 
(DB2 instance home directory). These dependencies prevent Tivoli SA MP from 
starting a database partition if the NFS server is not available. 


3.4 Managing high availability resources for the core 
warehouse 

IBM Smart Analytics System provides the high availability management toolkit 
with a set of scripts for managing the high availability configuration. The scripts 
are located under the /usr/SmartAnalytics/ha_tools directory on the management 
node. The root user can run each script as a command from the management 
node or copy the script to other nodes and run from there. These scripts manage 
the servers for the core warehouse (administration, data, and user nodes) only. 

The following scripts are used for managing the high availability configuration for 
the core warehouse: 

► hals: 

This command lists the DB2 database partitions resources and their high 
availability status. 

Syntax: hals 

► hastartdb2: 

This command starts the DB2 database partition resources. To start the 
resources for a specific node, the node name list must be specified as the 
nodelist argument. If a node name is not specified, the command is applied 
to all resources. This command starts the DB2 instance service and also 
mounts the volume group resources. 

Syntax: hastartdb2 [nodelist] 

► hastopdb2: 

This command stops the DB2 database partition resources. To stop the 
resources for a specific node, the node name list must be specified as 
nodelist argument. If a node name is not specified, the command is applied 
to all resources. This command stops the DB2 instance service and 
unmounts the volume group resources. 

Syntax: hastopdb2 [nodelist] 
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► hafai lover: 


This command fails over the node to the standby manually. The command 
moves the DB2 database partition resources from the node specified on the 
command argument to the standby node. 

Syntax: hafai lover nodename 

► hafai 1 back: 

This command fails back the node manually. It moves back the DB2 database 
partition resources from the standby node to the primary node specified on 
the command argument. 

Syntax: hafai 1 back nodename 

► hareset: 

This command attempts to reset the resources with the Pending, Failed, or 
Stuck nominal state. This command stops the high availability scripts to 
prevent the DB2 instance from being started. Next, the resources are reset. 

If the optional argument is not specified, the nominal state for all resources is 
changed to Offline. If the argument is specified, the command attempts to 
change the nominal state only for the ones that are in Pending, Failed or 
Stuck state. 

Syntax: hareset [nooffline] 

On the IBM Smart Analytics System 5600, additional scripts are provided for 
managing the NFS server high availability resource: 

► hastartnfs: 

This command starts the NFS services and attempts to mount the /db2home 
file system for all nodes. If the argument nomount is specified, it starts the NFS 
services only but does not mount the /db2home file system. Always start the 
NFS services before you start the DB2 database partition resources. 

Syntax: hastartnfs [nomount] 

► hastopnfs: 

This command unmounts the /db2home file system on all nodes. If the 
argument force is specified, it stops the DB2 database partitions resources 
that are still running, then unmounts the /db2home file system. 

Syntax: hastopnfs [force] 
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3.4.1 Monitoring the high availability resources for the core 
warehouse 


To monitor the resources and nodes for the core warehouse, you can use either 
the scripts in the high availability management toolkit (hals and hacknode) or 
Tivoli SA MP commands. 

Monitoring the high availability using the toolkit 

You can use the hal s command to check the resource status on an IBM Smart 
Analytics System environment. These scripts are available only for the core 
warehouse servers. 

The hal s command returns a summary of the resources and its nominal state for 
the entire cluster. 

Example 3-1 shows output from a hals command run on an IBM Smart Analytics 
System 7600. It has one administration node, one user node, and two data 
nodes. 


Example 3- 1 IBM Smart Analytics System 7600 hals output 


PARTITIONS 

| PRIMARY 

| SECONDARY 

| CURRENT LOCATION 

RESOURCE OPSTATE 

HA STATUS 

1,2, 3, 4 

d taNodeOl 

| standbyNode 

j dataNodeOl 

Online 

Normal 

5, 6, 7, 8 

dataNode02 

standbyNode 

dataNode02 

Online 

Normal 

0 

admi nNode 

standbyNode 

admi nNode 

Online 

Normal 

990 

userNode 

| standbyNode 

| userNode 

Online 

Normal 


The following fields are shown in this hal s command output: 

► Partitions: Shows a list of DB2 database partitions for the node in the node 
level group. 

► Primary: Shows the primary node for the DB2 database partition level 
resources. 

► Secondary: Shows the secondary node or standby for the DB2 database 
partition level resources. 

► Current location: Shows the current node location for the DB2 database 
partition level resources where the resources are running at the moment. 

► Resource opstate: Shows the nominal state for the node level resource group. 
The following nominal states are possible: 

- Online: This state is shown when all resources at the node level resource 
group are online. 

- Offline: This state is shown when all resources at the node level resource 
group are offline. 
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- Pending: When the resources are in a pending state, it can appear either 
as “Pending Offline” or “Pending Online”. 

► HA status: Shows the current status for the node. If the resources are running 
on its primary location, the status will be “Normal”. If the resources are 
running on the standby node, the status will appear as “Failover”. It can also 
show “Pending” if the resources are on either “Pending Online” or “Pending 
Offline” state. Another status is “Stuck” if the resources get into a stuck 
operational state. 

In this example all nodes are running on their primary location and also under 
Normal operational state. 

Monitoring the high availability using Tivoli SA MP 

The Tivoli SA MP native commands are tools for monitoring the high availability 
of the IBM Smart Analytics System environments. 

To obtain a more complete output, run the commands using the root user. Using 
the DB2 instance owner user to run the Tivoli SA MP commands will not show 
the locked state of a resource. 

The command used for monitoring the resources status is Issam. It shows all the 
resources, resource groups, the current state of the resource or group (OpState), 
the desired state (Nominal State), and other useful information. 

Example 3-2 shows the output format of the Issam command. 

Example 3-2 Example of an Isaam command output format 

<0pState> IBM.ResourceGroup:<ResourceGroup> Nominal =<NominalState> 

|- <0pstate> <ResourceClass>:<Resource> 

|- <0pState> <ResourceClass>:<Resource>:<NodeName> 

|- <0pState> <ResourceClass>:<Resource>:<NodeName> 


Here, the various parameters have the following meanings: 

► <0pState>: The current operational state of the resource or resource group. 
For example, 

► <ResourceGroup>: The resource group name. 

► <NominalState>: The nominal state for the resource group. 

► <ResourceClass>: The resource type which can be, for example, 
IBM.Application or IBM.ServicelP on the IBM Smart Analytics System 
environments. 

► <Resource>: The resource name. 

► <NodeName>: The host name for the node holding the resources. 
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Example 3-3 illustrates an output excerpt of lssam showing the administration 
node information of an IBM Smart Analytics System 7700. 

Example 3-3 lssam output for an administration node 


Online IBM. ResourceGroup : db2_bcuai x_server_Admi nNode-rg Nomi nal =0nl i ne 
|- Online IBM.ResourceGroup:db2_bcuaix_0-rg Nomi nal =0nline 
|- Online IBM.Application:db2_bcuaix_0-rs 

|- Offline IBM.Appl i cat ion: db2_bcuaix_0-rs: Admi nNode 
Online IBM.Appl i cati on : db2_bcuai x_0-rs : StandbyNodel 
|- Online IBM.Application:db2mnt-db2fs_bcLiaix_N0DE0000-rs 

|- Offline I BM . Appl i cati on :db2mnt-db2fs_bcuai x_N0DE0000-rs : Admi nNode 
' - Online IBM.Appl i cation : db2mnt-db2f sbcuai x_N0DE0000-rs : StandbyNodel 
|- Online IBM.Application:db2mnt-db2mlogJ>ciiaix_N0DE0000-rs 

|- Offline IBM. Appl i cati on:db2mnt-db2mlog_bcuaix_N0DE0000-rs:Admi nNode 
' - Online IBM. Appl i cati on : db2mnt-db2ml og_bcuai x_N0DE0000-rs : StandbyNodel 
|- Online IBM.Application:db2mnt-db2path_bcuaix_N0DE0000-rs 

|- Offline IBM. Appl i cati on:db2mnt-db2path_bcuaix_N0DE0000-rs:Admi nNode 
' - Online IBM.Appl i cation : db2mnt-db2path_bcuai x_NODEOOOO-rs : StandbyNodel 
|- Online IBM.Application:db2mnt-db2plogJ>cuaix_N0DE0000-rs 

|- Offline IBM. Appl i cati on:db2mnt-db2plog_bcuaix_N0DE0000-rs:Admi nNode 
' - Online IBM. Appl i cati on : db2mnt-db2pl og_bcuai x_NODEOOOO-rs : StandbyNodel 
Online IBM.ServiceIP:db2ip_172_23_l_lll-rs 

|- Offline I BM.ServiceIP:db2ip_172_23_l_lll-rs: Admi nNode 
Online IBM.ServiceIP:db2ip_172_23_l_lll-rs:StandbyNodel 
Online IBM.ResourceGroup:db2_bcuaix_AdminNode_vg-rg Nomi nal ^Online 
Online IBM.Application:db2_bcuaix_vgpO 

|- Offline IBM.Application:db2_bcuaix_vgpO:AdminNode 
Online IBM.Application:db2_bcuaix_vgpO:StandbyNodel 


The lssam command output organizes the resources and group information by 

indentation. In the foregoing example, we have the following resources: 

► The first line (the most left) is the node level resource group, 
db2_bcuaix_server_AdminNode-rg. The naming convention includes DB2 
instance name and node host name. 

► At the second level is the partition level resource group, db2_bcuaix_0-rg. 
The naming convention includes DB2 instance name and database partition 
number. 

► The third level is the resources. The resources are listed by resource types 
(Application, ServicelP). The naming convention includes the node that holds 
the resource, for example, db2_bcuaix_0-rs and db2ip_172_23_1_1 1 1-rs. 

3.4.2 Starting and stopping resources with the high availability 
management toolkit 

The preferred method to start and stop high availability resources for the core 

warehouse is by using the high availability management toolkit scripts: 

► hastartdb2: Use to start the DB2 resources managed by Tivoli SA MP: 

You have the option to specify a node list to startup resources in multiple 
nodes 
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► hastopdb2: Use to stop the DB2 resources managed by Tivoli SA MP: 

This command stops the DB2 database partition level resources and the 
volume group resource group. You have the option to specify a node list to 
stop resources in multiple nodes. 

On the IBM Smart Analytics System 5600, you can start and stop the NFS server 
using the hastartnfs and hastopnfs commands. For more references about the 
high availability management toolkit commands see 3.4, “Managing high 
availability resources for the core warehouse” on page 39. 

The commands must be run as the root user. The commands are asynchronous. 
To check the progress of the command running and the resources state use the 
hal s or 1 ssam commands. 

Before starting the high availability resources, check if all equivalencies and 
resources are available using the 1 ssam command. Example 3-4 shows an output 
excerpt with the equivalency status from an IBM Smart Analytics System 7600. 

Example 3-4 Issam ouptup showing the equivalencies status 

Online IBM.Equivalency:db2_db2home_gpfs_DataNode01-StandbyNodel-equ 

| - Onl ine IBM.AgFileSystem:db2homefs_StandbyNodel:StandbyNodel 
Online IBM.AgFileSystem:db2homefs_DataNode01:DataNode01 
Online IBM. Equi val ency : db2_db2home_gpfs_DataNode02-StandbyNodel-equ 

| - Onl ine IBM.AgFileSystem:db2homefs_StandbyNodel:StandbyNodel 
Online IBM.AgFileSystem:db2homefs_DataNode02:DataNode02 
Online IBM. Equi val ency :db2_db2home_gpfs_AdminNode-StandbyNodel-equ 
|- Online IBM.AgFileSystem:db2homefs_AdminNode:AdminNode 

Onl ine IBM.AgFileSystem:db2homefs_StandbyNodel:StandbyNodel 
Online IBM. Equi val ency :db2_db2home_gpfs_UserNode-StandbyNodel-equ 

| - Onl ine IBM.AgFileSystem:db2homefs_StandbyNodel:StandbyNodel 
Online IBM.AgFileSystem:db2homefs_UserNode:UserNode 
Online I BM . Equi val ency : db2_pri vate_network 

- Online IBM.NetworkInterface:enll:AdminNode 

- Online IBM.NetworkInterface:enll:DataNode01 

- Online IBM.NetworkInterface:enll:DataNode02 

- Online IBM.NetworkInterface:enll:StandbyNodel 
Online IBM.NetworkInterface:enll:UserNode 


This output shows the equivalencies of FCM network and /db2home file systems 
are online on all nodes. If any of them are Offline, troubleshoot the cause of the 
resource offline first to prevent an unintentional failover action initiated by Tivoli 
SA MP. For example, if a network cable is malfunctioning or unplugged, it will 
prevent the node level resource group from being started and cause a failover to 
the standby node where the resource (equivalency) is online. To avoid this 
situation, check the equivalency status before attempting to start the resource. 
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The /db2home file system is deployed as a GPFS on IBM Smart Analytics 
System 7600 and 7700. It is managed by the operating system and is mounted 
and shared across all nodes in the cluster automatically. You can mount and 
unmount a GPFS file system using the following commands: 

► mmumount all -a: Use to unmount the GPFS file systems on all nodes. 

► nmmount al 1 -a: Use to mount the GPFS file systems on all nodes. 

For further GPFS information, see the website: 

http : //www-03 . i bm. com/systems/cl usters/software/gpf s/resources . html 

Also check the node status before attempting to start the node level resource. To 
check the node status, use the Tivoli SA MP command lsrpnode. 

Example 3-5 shows an 1 srpnode command output from an IBM Smart Analytics 
System 7600. 


Example 3-5 lsrpnode output 


Name 

OpState 

RSCTVersion NodeNum NodelD 

standbyNode 

Onl ine 

2. 5. 3.0 5 

6ab290305fef0199 

adminNode 

Onl ine 

2. 5. 3.0 1 

ee2fd0af24445c0a 

dataNode02 

Onl ine 

2. 5. 3.0 3 

c7aac9e92d6291ca 

dataNodeOl 

Onl ine 

2. 5. 3.0 4 

f4601f8a0ccef90a 

userNode 

Onl ine 

2. 5. 3.0 2 

77d623b025a00520 


The database administrator still can use db2start and db2stop to start and stop 
a DB2 instance. When the DB2 instance owner user issues db2stop, the DB2 
instance service for each database partition is stopped. The cluster management 
software will not attempt to bring the instance online. The 1 ssam output will show 
the DB2 database partition resource is in the Offline and Suspended Propagated 
state. Tivoli SA MP interprets that a database administrator is doing a 
maintenance activity and suspends the high availability monitoring over the 
resource. 


Important: Do not attempt to start or stop database partition level resources 
when the DB2 database partition resources are in “Suspended Propagated” 
mode. Tivoli SA MP will prevent the resources from starting or stopping 
because it interprets that a maintenance task is in place. 


3.4.3 Starting and stopping resources with Tivoli SA MP commands 

Though the native Tivoli SA MP commands can be used to manage the core 
warehouse resources on the IBM Smart Analytics System environments, the 
preferred method is using the high availability management toolkit scripts. 
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This method is used because the high availability toolkit scripts automatically 
check to see if the required dependencies are available and the command syntax 
is simplified. 

Resources cannot be started and stopped individually in an HA cluster. Instead, 
you can control resources by changing the nominal state of the resource groups 
that contain them. 

As a root user, use the command chrg to change the nominal state for a resource 
group. To change a nominal state for a node level resource group, you must 
change the DB2 database partition level resource group first, then change the 
node level resource group. Use the following commands to change a nominal 
state for a resource group: 
chrg -o <NominalState> <ResourceGroup> 

Here, the parameters have the following meanings: 

► <NominalState> is the state defined for the resource. The value is either 
Offline or Online. 

► <ResourceGroup> is the resource group name. 

Use this order to change the resources for an administration node to Offline: 

chrg -o Offline db2_bcuaix_0-rg 

chrg -o Offline db2_bcuaix_server_AdminNode-rg 

chrg -o Offline db2_bcuaix_AdminNode_vg-rg 

Bring the DB2 database partition level resource group offline first, then the node 
level resource group, and the volume group level resource group last. 

To monitor the resource status, use the 1 ssam command. During the changing 
state of a resource, there is “Pending Offline” state if the status changing is from 
Online to Offline, and “Pending Online” if the change is from Offline to Online. 

3.4.4 Manual node failover for maintenance 

Moving resources from the primary node to the standby node reduces the 
service interruption required for system maintenance activities. After completing 
the maintenance tasks, the resources have to be moved back to the primary 
node. The manual failover for system maintenance must be performed on a 
maintenance window. 

The high availability toolkit scripts do not check for inflight transactions nor warn 
against them. Always verify that the system is quiesced properly and that there 
are no in flight transactions when perform manual node failover to avoid a service 
interruption and a service rollback for any inflight transactions. 
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To move the resources for the core warehouse server, use the high availability 
management toolkit script hafai lover. This command moves the resources from 
the specified server to the standby node of its high availability group. 


Root user: All high availability management toolkit commands must be 
performed as the root user. 


For example, to move the resources from the administration node to the standby 
node, issue this command: 
hafai lover <adminNode> 

In this command, <adminNode> is the administration node host name in the 
cluster. The command stops all Tivoli SA MP managed resources groups on the 
administration node, then starts the resources on the standby node. The Tivoli 
SA MP managed resources are DB2 database partition level resource, volume 
group resource, and Service IP resource. 

You can use either 1 ssam or hal s to monitor the resource status and the failover 
progress. 

Example 3-6 shows the hals output after the hafai lover command was run and 
the resources are moved to the standby node. 

Example 3-6 Issam output in failover state 


PARTITIONS 

| PRIMARY 

| SECONDARY 

| CURRENT LOCATION | 

RESOURCE 0PSTATE 

HA STATUS 

1,2, 3, 4 

| dataNodeOl 

| standbyNode 

| dataNodeOl 

Online 

Normal 

1 5, 6, 7, 8 

dataNode02 

| standbyNode 

| dataNode02 

Online 

| Normal | 

1 o 

adminNode 

| standbyNode 

j standbyNode j 

Online 

| Failover | 

| 990 

j aserNode 

| standbyNode 

| userNode 

Online 

| Normal | 


When the node requires long maintenance hours, after failing the resources over 
to the standby node, you can place the node in an ineligible list by using the 
samctrl Tivoli SA MP command as the root user: 
samctrl -u a <adminNode> 

Here, the <adminNode> is the host name of the node to be placed in the ineligible 
list in the cluster domain. 

Then you can unmount all shared file systems, for the IBM Smart Analytics 

System 7600 and 7700 where the file systems are managed by GPFS, use the 

commands: 

mmumount /db2home 

mmumount /home 
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For the IBM Smart Analytics System 5600, Tivoli SA MP manages the NFS 
server and will unmount the shared home directories automatically. 


Nodes: Removing a node from the cluster domain is an optional step when 
doing a manual failover. 


3.4.5 Manual node fallback 


When the maintenance tasks complete, fail the resources back to the primary 
nodes from the standby node or reintegrate the cluster domain if the node was 
placed in the ineligible list. Just as with the failover process, the tailback must be 
performed during a planned maintenance window. 

Check if all equivalencies and the node to be failed back are online before start 
the tailback process. Check also if the automation is active at the cluster using 
the following command: 
lssamctrl 


Example 3-7 shows that the automation is active. 
Example 3-7 Tivoli SA MP control output 
# lssamctrl 

Displaying SAM Control information: 


SAMControl : 

TimeOut 

RetryCount 

Automation 

ExcludedNodes 

ResourceRestartTimeOut 

ActiveVersion 

EnablePublisher 

TraceLevel 

ActivePolicy 

CleanupList 

PublisherList 


60 

3 

Auto 

{} 

5 

[3. 2. 0.0, Mon Oct 11 04:23:55 CDT 2010] 
Disabled 
31 
[] 

U 

{} 


If automation is inactive, use this command to activate it: 
samctrl -M F 


Important: Use the capital “M” option to take the cluster out of manual mode. 
(The small caps “m” means “migrate”.) 
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If the node was placed in the ineligible list during the failover, it must be removed 
from the list before starting the resources. To take the node from the ineligible list, 
run the following command: 
samctrl -u d <adminNode> 

Here, <adminNode> is the host name for the node that is being taken from the 
ineligible list in the domain. 

After the node is back in the domain, check the node status using lsrpdomain, 
and run the 1 ssam command to check if the equivalencies are online. 

When all equivalencies and the node are online, you can perform failback using 
the high availability management toolkit script hafailback. For example, to fail 
back the administration node, the command is: 
hafailback <adminNode> 

Here, the parameter <adminNode> is the host name of the administration node to 
be failed back to the primary position. 

Alternatively, you can use the Tivoli SA MP commands to fail the resources back 
to the primary position. To move the resources back to the primary position, the 
nominal state of the DB2 database partition level resource group and the node 
level resource group must be changed to offline. After the resources groups are 
brought to the online nominal state again, the resources are started at their 
preferred node position which, in this case, is the primary node. 

See “Starting and stopping resources with Tivoli SA MP commands” on page 45 
for information about how to stop and start the resources using Tivoli SA MP 
commands. 

Sometimes the resource nominal status for the equivalencies is something other 
than “Online”. It can be, for example, Stuck Offline or Failed Offline. If this is the 
case for the /db2home file system, try to restart the NFS server for the IBM 
Smart Analytics System 5600 using hastopnfs and hastartnfs. For the IBM 
Smart Analytics System 7600 and 7700, because /db2home is GPFS, unmount 
and mount the file system on that node using these commands: 
mmumount /db2home 
mmmount /db2home 

If the nominal states for the network equivalencies are showing as Failed or 
Stuck, reset the resource using the hareset command from the high availability 
management toolkit: 
hareset nooffline 
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If resetting the resource cannot bring the resource online, you can restart the 
entire node and then start the resources within the domain by performing the 
following steps: 

1 . Stop all high availability managed resources using hastopdb2. This high 
availability management toolkit command is the preferred method for this 
task. 

2. Unmount the shared file system: 

- Use hastopnfs for the IBM Smart Analytics 5600. 

- Issue mmumount all -a for the IBM Smart Analytics System 7600 and 
7700. 

3. Bring the domain to the offline state using stoprpdomain bcudomain. 

Bring the domain offline after all resources are stopped and the shared file 
systems are unmounted. The IBM Smart Analytics System high availability 
configuration always uses bcudomain as the default domain name. 

4. Start the domain using startrpdomain bcudomain. 

5. Check if the domain is online using the 1 srpdomai n and 1 srpnode. 

6. Mount the shared file systems: 

After the domain is online, mount the shared file systems with the commands: 

- hastartnfs for the IBM Smart Analytics System 5600 

- mmount all -a for the IBM Smart Analytics System 7600 and 7700 

7. Check to see if all equivalencies are online using 1 ssam. 

8. Start the resources using hastartdb2. 

9. The resources must be started at their primary location. Check the status 
using lssam. 

If for any reason the resources are not brought online at their primary location or 
are failing, more troubleshooting must be done. You can find the Tivoli SA MP log 
on the operating system syslog. Examine the syslog file for the node that has the 
problem. 

To check to see if a resource can be started on the failing node, start the 
resource manually. To do that, you must place the Tivoli SA MP in the manual 
mode using the following command: 
samctrl -M T 

Then start the resource that failed to start. For example, if the DB2 database 
partition resource failed to start, then start the DB2 database partition using the 
command: 
db2start nodenum N 
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Or, if it a file system resource does not come online, manually mount the file 
system on the node. If the problem is resolved, place Tivoli SA MP back to 
automatic mode using the command: 
samctrl -M F 

If the problem persists, contact the IBM Smart Analytics System Support with 
detailed information about the situation. 

For further information about high availability management and configuration for 
the core warehouse servers, see the IBM Smart Analytics System User’s Guide 
for your respective version. 


3.5 High availability for the warehouse application 
module 


An IBM Smart Analytics System warehouse applications module can consist of 
either one or two nodes: 

► Warehouse application node: 

A required node that hosts all the IBM InfoSphere Warehouse application 
components: 

- InfoSphere Warehouse Administration Console 

- InfoSphere Warehouse SQL Warehousing Tool 

- IBM Alphablox 

- InfoSphere Warehouse Miningblox 

- Cubing Services component 

- WebSphere Application Server 

For IBM Smart Analytics System offerings running DB2 for Linux, UNIX, and 
Windows Version 9.7, the application server metadata is hosted in a DB2 
database (iswmeta). For IBM Smart Analytics System offerings running DB2 
for Linux, UNIX, and Windows Version 9.5, the metadata is hosted in a table 
space (DWEDEFAULTCONTROL) at the warehouse production instance. 

► OLAP node: 

An optional node that hosts the Cubing Services components. The OLAP 
node is added to scale out the cube servers. The cube server runs on its own 
Java™ virtual machine (JVM) space and does not required hosting on a 
WebSphere Application Server. 

For most IBM Smart Analytics System offerings, the optional high availability 
configuration for the warehouse applications module uses an active-active 
failover configuration. 
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Figure 3-5 shows a diagram of an active/active configuration. Note that the 5600 
VI offering uses active-passive failover for the warehouse applications module. 
This chapter focuses on active-active failover, which is used in all other IBM 
Smart Analytics System offerings. 


& 


- - Failover 

Active node Active node 

- Failover ► 


Figure 3-5 Mutual failover high availability configuration 


High availability can be implemented for a warehouse applications module only 
when the module includes two nodes (the warehouse applications node and the 
OLAP node). 

Each application server has its file system defined on external storage, 
/usr/IBM/dwe/appserver_001 for the warehouse application node and 
/usr/IBM/dwe/appserver_002 for the OLAP node. Both nodes in the application 
server high availability cluster have access to these file systems. On offerings 
based on DB2 9.7, there is an additional file system named /iswhome on external 
storage. This file system is assigned to the warehouse applications node. 

Tivoli SA MP manages the application servers to provide high availability. The 
default domain name is DB2WHSEJHA. Example 3-8 shows an lsrpdomain 
command output from the warehouse application server. 

Example 3-8 Application servers high availability cluster domain 
lsrpdomain 

Name OpState RSCTActiveVersion MixedVersions TSPort GSPort 

DB2WHSE_HA Online 2. 5. 3.0 No 12347 12348 
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This domain is independent from the core warehouse high availability cluster 

domain. The DB2WHSE_HA has its own set of resources and resource groups. 

The resources are grouped as follows: 

► Warehouse application node: 

This resource group includes the following resources: 

- Application resources: Includes the InfoSphere Warehouse Administration 
Console, Alphablox platform, and the appserver_001 file system. 

- Metadata resources: Includes the IBM InfoSpere Warehouse metadata 
database resources. 

- Service IPs: Includes IP addresses for accessing warehouse application 
resources. 

In most IBM Smart Analytics System offerings, the warehouse application 
node manages another service IP dedicated for accessing the metadata 
resources. 

- User-created Alphablox applications and cube servers (when there is no 
OLAP node). 

► OLAP node: 

This resource group includes the following resources: 

- OLAP resources: Includes the Cubing Services cube server resources 
and appserver_002 file system. 

- Service IPs: Includes IP addresses for accessing OLAP node. 

- User-created cube server resources: When a new cube server is created 
by the user, it must be defined as a resource group to Tivoli SA MP so the 
new cube server can be highly available. In addition to the Tivoli SA MP, 
you can manage the user-created cube servers with the IBM InfoSphere 
Warehouse Administration Console (an option). 
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Figure 3-6 shows a high availability configuration for the warehouse application 
servers. 



Figure 3-6 Warehouse applications modules high availability configuration 


The high availability management for the warehouse applications and OLAP 
nodes is performed using the Tivoli SA MP commands only. There are no high 
availability management toolkit scripts available. 

Example 3-9 shows an 1 ssam output taken from an IBM Smart Analytics System 
7600 with one warehouse application node (whsrv) and one OLAP node 
(olapnode). You can see the location where each resource is running. Both DB2 
dweadmin instance service and the /iswhome file system are running on the 
warehouse application server (whsrv). 
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Example 3-9 IBM Smart Analytics System 7600 Issam output 


Online IBM.ResourceGroup:db2_dweadmin_0-rg Nominal=Onl ine 
|- Online IBM.Appl ication:db2_dweadmin_0-rs 

|- Online IBM.Application:db2_dweadmin_0-rs:whsrv 
Offline IBM.Application:db2_dweadmin_0-rs:olapnode 
Online IBM.Application:db2mnt-iswhome-rs 

|- Online IBM.Application:db2mnt-iswhome-rs:whsrv 
Offline IBM.Appl i cation : db2mnt-i swhome-rs : ol apnode 
Onl i ne IBM. ResourceGroup:db2whse_ha_whsrv_typel .AppserverFi 1 esystemsServi celps . rg Nomi nal =0nl i ne 
| - Onl i ne IBM.Appl i cati on :db2whse_ha_whsrv_typel .appserver_001_fi 1 esystem 

| - Online IBM.Appl i cat ion: db2whse_ha_whsrv_typel.appserver_001_fil esystem: whsrv 
1 - Offl i ne IBM.Appl i cati on :db2whse_ha_whsrv_typel .appserver_001_f i 1 esystem:ol apnode 
|- Online IBM.ServiceIP:db2whse_ha_whsrv_typel.l0_199_64_126 

|- Online IBM.Serv1ceIP:db2whse_ha_whsrv_typel.l0_199_64_126:whsrv 
1 - Offl i ne IBM. Servi cel P : db2whse_ha_whsrv_typel . 10_199_64_126 : ol apnode 
Online IBM.ServiceIP:db2whse_ha_whsrv_typel.l0_199_65_126 

|- Online IBM.ServiceIP:db2whse_ha_whsrv_typel.l0_199_65_126:whsrv 
1 - Offl i ne IBM. Servi cel P : db2whse_ha_whsrv_typel . 10_199_65_126 : ol apnode 
Online IBM.ResourceGroup:db2whse_ha_whsrv_typel. abxpl atform. rg Nomi nal =0nl ine 
Online IBM.Application:db2whse_ha_whsrv_typel.abxplatform 

| - Online I BM . Appl i cati on : db2whse_ha_whsrv_typel . abxpl atform: whsrv 
1 - Offl i ne IBM.Appl i cati on :db2whse_ha_whsrv_typel .abxpl atform: ol apnode 
Onl i ne IBM. ResourceGroup:db2whse_ha_whsrv_typel .admi nconsol e. rg Nomi nal =0nl i ne 
' - Onl i ne IBM.Appl i cati on :db2whse_ha_whsrv_typel .admi nconsol e 

| - Online I BM . Appl i cati on : db2whse_ha_whsrv_typel . admi nconsol e: whsrv 
1 - Offl i ne IBM.Appl i cati on :db2whse_ha_whsrv_typel. admi nconsol e : ol apnode 
Online IBM.ResourceGroup:db2whse_ha_whsrv_typel.appserver.rg Nomi nal Onl ine 
Online IBM.Application:db2whse_ha_whsrv_typel.appserver 

| - Onl i ne IBM.Appl i cati on:db2whse_ha_whsrv_typel . appserver:whsrv 
1 - Offl i ne IBM.Appl i cati on :db2whse_ha_whsrv_typel .appserver:ol apnode 
Onl i ne IBM. ResourceGroup:db2whse_ha_ol apnode_type2. AppserverFi 1 esystemsServi celps . rg Nomi nal =0nl i ne 
| - Online IBM.Appl i cati on : db2whse_ha_ol apnode_type2 . appserver_002_f i 1 esystem 

| - Offl i ne IBM.Appl i cati on : db2whse_ha_ol apnode_type2 . appserver_002_f i 1 esystem : whsrv 
1 - Online I BM . Appl i cati on : db2whse_ha_ol apnode_type2 . appserver_002_f i 1 esystem: ol apnode 
| - Online IBM. Servi cel P : db2whse_ha_ol apnode_type2 . 10_199_64_127 

| - Offl i ne IBM. Servi cel P : db2whse_ha_ol apnode_type2 . 10_199_64_127 : whsrv 
1 - Online I BM . Servi celP: db2whse_ha_ol apnode_type2 . 10_199_64_127 : ol apnode 
1 - Online IBM. Servi cel P : db2whse_ha_ol apnode _type2 . 10_199_65_127 

| - Offl i ne IBM. Servi cel P : db2whse_ha_ol apnode_type2 . 10_199_65_127 : whsrv 
1 - Online I BM . Servi celP: db2whse_ha_ol apnode_type2 . 10_199_65_127 : ol apnode 
Online IBM . Equi val ency : db2_dweadmi n_0-rg_group-equ 
|- Online IBM.PeerNode:whsrv:whsrv 

Online IBM.PeerNode:olapnode:olapnode 
Online IBM. Equi val ency: db2whse_ha_whsrv_typel_networkadapter_equ 
|- Online IBM.NetworkInterface:enl2:olapnode 
Online IBM.NetworkInterface:enl2:whsrv 
Online IBM. Equi val ency: db2whse_ha_whsrv_typel_networkadapter_equ_enl3 
|- Online IBM.NetworkInterface:enl3:olapnode 
Online IBM.NetworkInterface:enl3:whsrv 
Online IBM. Equi val ency: db2whse_ha_whsrv_typel_nodes_equiv 
|- Online IBM.PeerNode:whsrv:whsrv 

Online IBM.PeerNode:olapnode:olapnode 
Online IBM . Equi val ency : db2whse_ha_ol apnode_type2_networkadapter_equ 
|- Online IBM.NetworkInterface:enl2:olapnode 
Online IBM.NetworkInterface:enl2:whsrv 
Online IBM . Equi val ency : db2whse_ha_ol apnode_type2_networkadapter_equ_enl3 
|- Online IBM.NetworkInterface:enl3:olapnode 
Online IBM.NetworkInterface:enl3:whsrv 
Online IBM. Equi val ency: db2whse_ha_ol apnode_type2_nodes_equiv 
|- Online IBM.PeerNode:olapnode:olapnode 
Online IBM.PeerNode:whsrv:whsrv 
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3.5.1 Starting and stopping high availability resources for warehouse 
application servers 

For the IBM Smart Analytics System 5600 V2 and 7700, the warehouse 
applications module is always managed using Tivoli SA MP commands, whether 
it contains one node or two nodes. Use Tivoli SA MP commands to start and stop 
the resources. 

With the 5600 VI and 7600, the nodes are set up to be managed using Tivoli SA 
MP only when the configuration has two nodes. Use Tivoli SA MP commands to 
start and stop the resources. If the configuration has a single node, you must use 
the regular InfoSphere Warehouse methods to start and stop components. For 
example, to start the InfoSphere Warehouse metadata database, issue db2start 
using the DB2 instance owner (dweadmin). 

To check to see if there is a Tivoli SA MP cluster domain, use 1 srpdomain. If the 
output shows that the DB2WHSE_HA domain exists, always use Tivoli SA MP 
chrg command to start and stop the resources. 

Example 3-10 shows how to start an InfoSphere Warehouse metadata database 
resource. 

Example 3-10 Changing resource group nominal state to Online 
chrg -o Online -s "Name like 'db2_%_0-rg'" 


There are dependencies defined on the warehouse applications nodes. For 
example, the IBM InfoSphere Warehouse metadata resource must always be 
started before the WebSphere Application Server Resource. 


3.5.2 Manual failover warehouse application node 

The following active resources are running on the warehouse application node: 

► IBM InfoSphere Warehouse metadata database resources 

► Service IPs and file system appserver_001 resources 

► WebSphere Application Server resource 

► Alphablox platform resources 

► IBM InfoSphere Warehouse Administration Console resource 
The following active resources are running on a OLAP node: 

► Service IP 

► File system appserver_002 

► OLAP resources 
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Before failing over resources, ensure that all activity is quiesced on the 
Warehouse applications module. One technique to move resources from one 
node to another, is to add that node to the ineligible node list using the Tivoli SA 
MP command samctrl : 

samctrl -u a <node name> 

Here, <node name> is the host name of the warehouse application node or OLAP 
node. 

This command brings down all resources which are currently running. When all 
resources are stopped, Tivoli SA MP attempts to start those resources on the 
next available node. You can verify the status and location of the resources using 
the lssam command. 

You can use this method to move the resources running on the warehouse 
application node to the OLAP node or vice versa to perform maintenance 
activities. 

3.5.3 Manual fallback of the warehouse application node 

When maintenance is concluded, you can move the tailed-over resources back to 
their designated primary node. When the warehouse application resources have 
failed over to the OLAP node, they can be failed back to the primary location 
without impacting the running activities on the OLAP node. That is the case if the 
cube server are not running on the IBM InfoSphere Warehouse Administration 
Console. When the resources from the OLAP node have failed over the 
warehouse application node, the resources can be failed back to the OLAP node 
without stopping the running activities on the warehouse application node. 

Use the 1 srpnode command to check the node status. Use the 1 ssamctrl 
command to check if the node is in the excluded list, and then use the lssam 
command to check the equivalencies status before restarting the resources. 

To fail back, first check if the node is in the excluded list using 1 ssamctrl . 
Example 3-1 1 shows that when a warehouse application node resources were 
failed over to the OLAP node manually, the node is included in an list for the 
non-available nodes. 

Example 3- 1 1 The warehouse application node in the excluded node list 
# 1 ssamctrl 

Displaying SAM Control information: 

SAMControl : 

TimeOut = 60 

RetryCount = 3 
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Automation 

ExcludedNodes 

ResourceRestartTimeOut = 

ActiveVersion 

EnablePublisher 

TraceLevel 

Acti vePol i cy 

CleanupList 

PublisherList 


Auto 

{UarehouseAppNode} 

5 

[3. 1.0. 7, Wed Oct 27 10:44:27 EDT 2010] 
Disabled 
31 
[] 

U 

{} 


Set the node to online in the domain and remove it from the ineligible category 
using the following command: 
samctrl -u d <nodename> 

The node is eligible to accept resources again. 

To move resources back to the designated primary node, if the servers are set up 
as active/active configuration, use the chrg commands to stop all resource 
groups associated with the node which has resources that are failed over. Do not 
use the samctrl command because this will result in all resources moving, not 
just failed-over resources. 

To move the resources from the OLAP node back to the warehouse application 
node, stop the resources with the Warehouse application node as the primary 
server on the OLAP node using the Tivoli SA MP command chrg in the following 
sequence: 

1 . Stop the resources for the Alphablox platform (including the user-created 
applications), and InfoSphere Warehouse Administration Console. 

2. Stop the WebSphere Application server resource. 

3. Stop the application server file system (/appserver_001) and service IPs 
resources. 

4. Stop the InfoSphere Warehouse metadata database resources. 

Start the resources using the chrg command on the warehouse application node 
in the following order: 

1 . Start the application server file system (/appserver_001 ) and service IPs 
resources. 

2. Start the InfoSphere Warehouse metadata database resources. 

3. Start the WebSphere Application server resource. 

4. Start the resources for the Alphablox platform (including the user-created 
applications), and InfoSphere Warehouse Administration Console. 
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In the case of failing back the resources to the OLAP node, after the OLAP node 
is brought online with the samctrl -u d <0LAP_Node> command, stop the 
resources designated to the OLAP node from the warehouse node using the 
Tivoli SA MP command chrg in the following sequence: 

1 . Stop the cube server resources. 

2. Stop the application server file system (/appserver_002) and service IPs 
resources. 

After these resources are stopped, then start the resources from the OLAP node 
using the chrg command in the following order: 

1 . Start the application server file system (/appserver_002) and service IPs 
resources. 

2. Start the cube server resources. 

In this section, we provide the basic information about how to manage the high 
availability resources for the warehouse application nodes. For further details 
about the IBM Smart Analytics System warehouse application nodes, see the 
IBM Smart Analytics System User’s Guide for your respective version. 

For more information about IBM Tivoli System Automation, see this address: 
http : //www-947 . i bm . com/support/entry/portal /Overvi ew/Software/T i vol i /T i 
vol i_System_Automati on_for_Mul ti pi atforms 

For more information about the IBM InfoSphere Warehouse Application 
component, see this address: 

http://publib.boulder.ibm.com/infocenter/db21uw/v9r7/index.jsp?topic=/c 
om . i bm . dwe . navigate . doc/wel come_warehouse . html 

3.6 High availability for the business intelligence 
module 

The IBM Smart Analytics System business intelligence (Bl) module hosts the 
IBM Cognos software providing the Bl capabilities such as dashboarding, query 
reporting, and analysis. All nodes have the same software stack when the Bl 
module is deployed. The following major software is installed on the Bl nodes: 

► IBM Cognos Bl Server 

► IBM Cognos Go! Dashboard 

► IBM WebSphere Application Server 

► IBM HTTP Server 

► IBM DB2 Enterprise Server Edition 

► IBM Tivoli System Automation 
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From the high availability configuration point of view, the most related Cognos 

components are as follows: 

► Gateway: This component receives the user requests, validates and encrypts 
the password, and captures the required information for sending the request 
to the IBM Cognos Server. The gateway then passes the request to a 
dispatcher for later processing. High availability for this component is 
managed by Tivoli SA MP. 

► Report service: This component manages the report requests and delivers 
the results through a web portal named Cognos Connection. The report 
requests is managed by the Cognos dispatcher. High availability for this 
component is managed by the Cognos server. 

► Content Manager: This component manages the storage of application data 
such as models, report specifications, report outputs, configuration data, and 
security. This information is required for publishing packages, retrieving 
schedule information, restoring report specifications, and managing the 
Cognos namespace. The high availability for this component is managed by 
the Cognos Server. The database resources for the content store is managed 
by Tivoli SA MP. 

From the functionary point of view, the Bl nodes are classified into three types: 

► Bl type 1 node: This node type is in charge of managing user requests 
through the Cognos gateway and to process the report requests. The number 
of reports processed is based on the weight associated with the Cognos 
dispatcher and varies depending on the number of Bl extension nodes. The 
Bl type 1 node also hosts the standby Cognos Content Manager and provides 
high availability support for the content store. All IBM Smart Analytics System 
Bl module have one type 1 node. 

► Bl type 2 node: This node hosts the active Cognos Content Manager, the 
content store database, and the audit database. It also processes report 
requests. The gateway on this node provides high availability support to the 
Cognos gateway on the Bl type 1 node. All IBM Smart Analytics System Bl 
module have one type 2 node. 

► Bl extension node: The primary role of this type of node is the report 
processing. This node is configured to have a maximum report processing 
capacity. The gateway is installed but not used, and the Content Manager is 
installed but not started. IBM Smart Analytics System 5600 Bl module can 
have zero to four extension nodes. IBM Smart Analytics System 7600 can 
have zero to two extension nodes. 
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The IBM Smart Analytics System 5600 consists of two to six Bl nodes: one type 
1 , one type 2, and zero to four extension nodes. Figure 3-7 on page 61 shows the 
minimum configuration for the IBM Smart Analytics System 5600 Bl module, one 
type 1 Bl node and one type 2 Bl node. 



Figure 3-7 Bl module with two nodes HA configuration 


The IBM Smart Analytics System 7600 consists of two to four Bl nodes, one type 
1 , one type 2, and zero to two extension nodes. Figure 3-8 shows a configuration 
with one type 1 Bl node, one type 2 Bl node, and one Bl extension node. 
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Figure 3-8 Bl module with three nodes HA configuration 


The high availability strategy for the IBM Smart Analytics System Bl module is 
the mutual failover (active/active configuration). It is implemented for the Bl 
module using native Cognos Bl Server functionality to manage the active Cognos 
Content Manager and Tivoli SA MP to manage high availability resource groups 
for the Cognos gateway and Bl module database resources. 

If the Bl type 1 node fails, high availability resources managed by Tivoli SA MP 
detects the failure and transfers the Cognos gateway resource to the Bl type 2 
node. When a failure occurs on the Bl type 2 node, the Cognos Bl Server 
application detects the failure and designates the Bl type 1 node as the active 
Content Manager. Tivoli SA MP also manages the database resources on the Bl 
type 2 node. When a failure occurs, the database resources are transferred to 
the Bl type 1 node. 
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Because the report processing is available on every node, there is no need to set 
up the failover node for the Bl extension nodes. If a failure occurs on any Bl 
extension node, the report processing can start on another server. The 
performance to manage the workloads can be impacted. 

The high availability resources present on the IBM Smart Analytics System Bl 
nodes are managed by native IBM Cognos Bl Server and IBM Tivoli System 
Automation. The following highly available resources are managed by IBM 
Cognos Bl Server: 

► IBM Cognos Content Manager 

► IBM Cognos Gateway 

► IBM Cognos Report Service 

The IBM Tivoli System Automation manages these resources: 

► IBM Cognos gateway resource group: 

This resource group, ihs-rg, has the service IP for the end users to reach the 
IBM Cognos gateway. It also has the HTTP server. There are dependencies 
among the resources managed by Tivoli SA MP, in case of failure of one of 
these resources, both the failed resources and its dependencies are moved to 
the secondary node. 

► Bl node database resource group: 

This resource group, db2_coginst_0-rg, holds the Bl node database instance 
with the content-store database, auditing database, and Samples database. 

It has the resources for the file systems /cogfs and /coghome, the Bl node 
instance, and the service IP for the management network that handles the 
connection to the databases under the Bl node instance. 

The primary role of Bl type 1 node is to process the user requests coming from 
the IBM Cognos gateway. The Tivoli SA MP resource group for the IBM Cognos 
gateway is assigned to this node. The IBM Cognos Content Manager is in a 
standby state on the Bl type 2 node. If the Tivoli SA MP detects that the server 
failed or is unreachable, it will automatically transfer the resources in the IBM 
Cognos gateway resource group to the Bl type 2 node, and make the Bl type 2 
node the active IBM Cognos gateway. 

When the Bl type 1 node becomes operational again, the gateway service IP 
address remains assigned to the type 2 node. IBM Smart Analytics System 
provides a set of scripts to fail back the resources to the Bl type 1 node. Run 
these scripts when there is no activity on the cluster to prevent workload 
disruption. 
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When the Bl type 2 node server fails or is unable to connect to the internal 
application network, both the IBM Cognos Bl Server application and the Tivoli 
SA MP detect the failure and automatically perform the failover. The IBM Cognos 
Bl Server application conducts an election to designate a new active Content 
Manager. As a result of the election process, the Bl type 1 node becomes the 
new active Content Manager. The Tivoli SA MP then transfers the resources in 
the Bl module database resource group to the Bl type 1 node, mounts the /cogfs 
file system, and starts the Bl module instance, as well as, the service IP. 


Nodes: The Bl extension nodes are not part of the HA configuration for the 
IBM Smart Analytics System Bl module. The Bl extension nodes will only 
carry out report processing workloads, they cannot assume any resource from 
the Bl type 1 node and Bl type 2 node in case of failure. 


To fail back the resources for the Bl module instance and activate the IBM 
Cognos Content Manager at the Bl type 2 node, activate the IBM Cognos 
Content Manager on the Bl type 2 node first, then move back the Bl module 
instance resource group using the provided script. 

Use 1 ssam to check the nominal state for the high availability resources. 

Example 3-12 shows an lssam output for a Bl module containing one type 1 node 
and one type 2 node. 

Example 3- 12 Bl module lssam output 

Online IBM.ResourceGroup:db2_coginst_0-rg Nominal=Onl ine 

|- Online IBM.Application:db2_coginst_0-rs 

|- Offline IBM.Application:db2_coginst_0-rs:BI_module_tpl 
Online IBM.Appl ication:db2_coginst_0-rs:BI_module_tp2 
|- Online IBM.Application:db2mnt-cogfs-rs 

|- Offline IBM.Appl i cati on : db2mnt-cogfs-rs : BI_modul e_tpl 
Online IBM.Application:db2mnt-cogfs-rs:BI_module_tp2 
Online IBM.ServiceIP:db2ip_192_168_122_104-rs 

|- Offline IBM.ServiceIP:db2ip_192_168_122_104-rs:BI_module_tpl 
1 - Online IBM. Servi cel P : db2i p_192_168_122_104-rs : BI_modul e_tp2 
Online IBM.ResourceGroup:ihs-rg Nominal =0nl ine 
|- Online IBM.Application:ihs-rs 

|- Online IBM. Appli cation :ihs-rs:BI_module_tpl 
Offline I BM . Appl i cati on : i hs-rs : Bljuodul e_tp2 
Online IBM.ServiceIP:ihs-sip-rs 

|- Online IBM.ServiceIP:ihs-sip-rs:BI_module_tpl 
Offline IBM. Servi ceIP:ihs-sip-rs:BI_module_tp2 
Onl i ne IBM . Equi val ency : FCM_network 

|- Online IBM.NetworkInterface:enll:BI_module_tp2 
Online IBM.NetworkInterface:enll:BI_module_tpl 
Onl i ne IBM . Equi val ency : db2_coghome_gpfs_BI jnodul e_tpl-BI_modul e_tp2-equ 
| - Online IBM. AgFi 1 eSystem: coghome_BI_modul e_tpl : Bljuodul e_tpl 
Online IBM. AgFi 1 eSystem :coghome_BI_modul e_tp2:BI_module_tp2 
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Onl i ne IBM . Equi val ency : db2_cogi nst_0-rg_group-equ 

|- Online IBM.PeerNode:BI_module_tp2:BI_module_tp2 
Online IBM.PeerNode:BI_module_tpl:BI_module_tpl 
Onl i ne IBM . Equi val ency : i hs_network_equi v 

|- Online IBM.NetworkInterface:enl2:BI_module_tpl 
Online IBM.NetworkInterface:enl2:BI_module_tp2 
Onl i ne IBM . Equi val ency : i hs_nodes_equi v 

|- Online IBM.PeerNode:BI_module_tpl:BI_module_tpl 
Online IBM.PeerNode:BI_module_tp2:BI_module_tp2 


Commands: For the commands described in this section, see the IBM Smart 
Analytics System 5600 and 7700. For the IBM Smart Analytics System 7600, 
the installation path is under the /usr directory instead of /opt/. 


3.6.1 Starting and stopping high availability resources for Bl module 

This section describes the procedure to start and stop the high availability 
resources for Bl module. 

Starting the Bl module 

To start the resources on the Bl module, perform these steps: 

1 . Start the cluster domain. 

Check if the cluster domain is online using the Tivoli SA MP command 
lsrpdomain. The output is similar to Example 3-13. 

Example 3-13 IBM Tivoli System Automation cluster domain for Bl module 

Name OpState RSCTActiveVersion MixedVersions TSPort GSPort 
cognos_bi Online 2.5.4. 1 No 12347 12348 

If the cluster domain is offline, log in as root from the Bl type 2 node and start 
the domain using the following Tivoli SA MP command: 
startrpdomain cognos_bi 

2. Check the status of the Tivoli SA MP resources using 1 ssam. Check if all 
equivalencies are online. 

3. Start the resources from the Bl type 2 node using the command: 
chrg -o “Online” -s “1=1” 

This command starts the Bl module database resource and the IBM Cognos 
gateway resource. 
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4. Check the status using 1 ssam. 

Example 3-14 shows a sample output. The resource group for the Bl module 
database resources (db2_coginst_0) and the resource group for the IBM 
Cognos gateway (ihs-rg) must appear as Online. If any of the resources failed 
to start, troubleshoot the problem before starting the application servers. 
Contact IBM Customer Support for the IBM Smart Analytics System for 
further assistance. 

Example 3-14 Bl module Issam output 

Online IBM.ResourceGroup:db2_coginst_0-rg Nominal ^Online 

|- Online IBM.Appl ication:db2_coginst_0-rs 

|- Offline IBM.Application:db2_coginst_0-rs:BI_module_tpl 
Online IBM.Application:db2_coginst_0-rs:BI_module_tp2 
|- Online IBM.Appl ication:db2mnt-cogfs-rs 

|- Offline IBM.Application:db2mnt-cogfs-rs:BI_module_tpl 
Online IBM.Application:db2mnt-cogfs-rs:BI_module_tp2 
Online IBM.ServiceIP:db2ip_192_168_122_104-rs 

|- Offline I BM. Servi cel P : db2i p_192_168_122_104-rs : BI_modul e_tpl 
Online IBM.ServiceIP:db2ip_192_168_122_104-rs:BI_module_tp2 
Online IBM.ResourceGroup:ihs-rg Nominal ^Online 
|- Online IBM.Appl ication:ihs-rs 

|- Online IBM.Application:ihs-rs:BI_module_tpl 
Offline IBM.Appl i cation:ihs-rs:BI_modul e_tp2 
Online IBM.ServiceIP:ihs-sip-rs 

|- Online IBM.ServiceIP:ihs-sip-rs:BI_module_tpl 
Offline IBM.ServiceIP:ihs-sip-rs:BI_module_tp2 
Onl i ne I BM. Equi val ency : FCM_network 

|- Online IBM.NetworkInterface:enll:BI_module_tp2 
Online IBM.NetworkInterface:enll:BI_module_tpl 
Onl i ne I BM. Equi val ency : db2_coghome_gpf s_BI_modul e_tpl-BI_modul e_tp2-equ 
|- Online IBM.AgFileSystem:coghome_BI_module_tpl:BI_module_tpl 
Online IBM.AgFileSystem:coghome_BI_module_tp2:BI_module_tp2 
Onl i ne I BM. Equi val ency : db2_cogi nst_0-rg_group-equ 

|- Online IBM.PeerNode:BI_module_tp2:BI_module_tp2 
Online IBM.PeerNode:BI_module_tpl:BI_module_tpl 
Onl i ne I BM. Equi val ency : i hs_network_equi v 

|- Online IBM.NetworkInterface:enl2:BI_module_tpl 
Online IBM.NetworkInterface:enl2:BI_module_tp2 
Onl i ne I BM. Equi val ency : i hs_nodes_equi v 

|- Online IBM.PeerNode:BI_module_tpl:BI_module_tpl 
Online IBM.PeerNode:BI_module_tp2:BI_module_tp2 


5. Check if the DB2 database for the IBM Cognos Content store is available. 
From the Bl type 2 node, issue Issam and make sure that the Bl module 
database resources are Online at the Bl type 2node as shown in 
Example 3-14. If the resources are not online or are running on the Bl type 1 
node, try to restart the resource group using the command: 
chrg -o Offline -s "Name like 'db2_coginst_0' " 
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Then: 

chrg -o Online -s "Name like 'db2_coginst_0' " 

If they failed to start at the Bl type 2 node, more troubleshooting needs to be 
performed. Contact IBM Customer Support for the IBM Smart Analytics 
System for further assistance if required. 

6. Verify database connection. 

Log on to the Bl type 2 node using the content store instance owner coginst 

user. Connect to the Content Store database: 

db2 connect to csdb 

Here, csdb is the content store database. 

If the database is failing to connect, look at the database logs to find out what 
causes the connection to fail. 

7. Start the application server. 

Log onto the Bl type 2 node as the cognos user (this is the default the user 
ID). Start the application server on the Bl nodes, beginning from the Bl type 2 
node and then the Bl type 1 node, using the following command: 
cd /opt/IBM/WebSphere_7/AppServer/prof i 1 es/AppSrvOl/bi n/ 
./startServer.sh serverl 

8. Verify if the IBM Cognos Content Manager is running. 

Check if the IBM Cognos Content Manager is running on the Bl type 2 node 
by access the Content Manager status page at the URL: 
http://<hostBInode2>:9081/p2pd/servlet 
Here, <hostBInode2> is the host name of the Bl type 2 node. 

Make sure that the state is Running. If it is Running as Stand By, activate the 
Content Manager at the Bl node Type 2 using the procedure described at 
“Designating the active IBM Cognos Content Manager” on page 72. 


Content store: The application server must be started when the content 
store database is available. If it was started when the content store was not 
available, refresh the application server using the following commands 
(logged as the cognos user): 

cd /opt/IBM/WebSphere_7/AppServer/profiles/AppSrv01/bin/ 
./stopServer.sh serverl 
./startServer.sh serverl 
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9. After the application server is started, verify if the dashboarding and the 
reporting features is available by accessing the IBM Cognos Connection 
Portal. 

For further reference about how to access IBM Cognos applications, see the IBM 
Smart Analytics System product documentation. 

Stopping the Bl module 

To stop the resources on the IBM module, perform the following steps: 

1 . Make sure that there are no running workloads on the environment. 

The stop procedure must be planned and performed within a maintenance 
window to avoid interrupting the reporting processing. 

2. Stop the application server. 

To avoid unnecessary failover, stop the application server on the Bl extension 
node first (if there is a Bl extension node), then stop the application server on 
the Bl type 1 node before stop the application server on the Bl type 2 node. 
To stop the application server, use the following command: 
cd /opt/IBM/WebSphere_7/AppServer/prof i 1 es/AppSrvOl/bi n 
./stopServer.sh serverl 

If an application server is stopped properly, you see a message similar to this: 
Server serverl stop completed 

3. Stop the high availability resources. 

To stop the high availability resources managed by Tivoli SA MP, use the 

following command: 

chrg -o offl ine -s "1=1" 

Verify the resource status using the lssam command. All resources must be 
offline. 

4. Bring the cluster domain off line. 

After all resources are offline, you can place the cluster domain offline using 
the following command. 

stoprpdomain cognos_bi 

Check the domain status using the lsrpdomain command. Example 3-15 

Example 3-15 Bl module HA cluster domain offline 

Name OpState RSCTActiveVersion MixedVersions TSPort GSPort 
cognos_bi Offline 2.5.4. 1 No 12347 12348 
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3.6.2 Manual failover Bl type 1 node to Bl type 2 node 


The IBM Smart Analytics System Bl type 1 node hosts the high availability 
resources for the IBM Cognos gateway and handles the report processing 
workloads for the data warehouse environment. You can manually move the 
resources from the node type 1 over to the Bl type 2 node for performing 
maintenance activities. 

To manually fail over the resources running on Bl type 1 node to the Bl type 2 
node, perform the following steps: 

1 . Stop the application server running on the Bl type 1 node. 

Log onto the Bl type 1 node with the cognos user, stop all dashboard and 
reporting processing using the following commands: 
cd /opt/IBM/WebSphere_7/AppServer/prof i 1 es/AppSrvOl/bi n 
./stopServer.sh serverl 

2. Move the IBM Cognos gateway resources to the Bl type 2 node. 

Log on to the Bl type 1 node as root and run the following commands: 

cd /root/scripts/tsa 
./failover_sip.sh 

Using lssam to check if the high availability resources are stopped on the Bl 
type 1 node. 

3. Access the IBM Cognos Connection Portal to verify if the dashboarding and 
reporting activities are available. 

After the Bl type 1 node becomes available to resume its duty, use the process 
described in “Manual tailback Bl type 1 node” on page 70 to fail back its 
resources manually. 

3.6.3 Manual failover Bl type 2 node to Bl type 1 node 

The IBM Smart Analytics System Bl type 2 node hosts the high availability 
resources for the IBM Cognos Content Manager, Bl module instance, and the 
report processing workloads for the data warehouse environment. You can 
manually move the Bl type 2 node resources over to the Bl node typel for 
maintenance activities. 
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To manually fail over the resources running on Bl type 2 node to the Bl type 1 
node, perform the following steps: 

1 . Stop the application server on the Bl type 2 node. 

Log onto the Bl type 2 node with the cognos user, run these commands to 
stop the application server: 

cd /opt/IBM/WebSphere_7/AppServer/prof i 1 es/AppSrvOl/bi n 
./stopServer.sh serverl 

2. Move the Bl module instance resources to the Bl type 1 node. 

Log onto the Bl type 2 node with the root user, run the following commands: 

cd /root/scripts/tsa 
./failover_db2.sh 

Check the high availability resource status with the 1 ssam command. Verify if 
all Bl module instance resources, db2_coginst_0, cogfs, coghome, and the 
database resource ServicelP, are online on the Bl type 1 node. 

3. Access the IBM Cognos Connection Portal to check if the dashboarding and 
report services are available. 

If you experience problems to access the IBM Cognos Connection Portal, find 
the problem using the procedure describe in “Troubleshooting connections 
after failover of Bl type 2 node” on page 74. 

3.6.4 Manual fallback Bl type 1 node 

After the Bl type 1 node is available again, the node type 1 resource running on 
Bl type 2 node must be manually failed back to the Bl type 1 node. Similar to all 
tailback operations, this activity must be planned to avoid disrupting the running 
workloads. 

Before performing the tailback, check the nominal status for the resources using 
the Tivoli SA MP command 1 ssam. The resource for the gateway when running in 
failover mode appears like in Example 3-16. 

Example 3-16 Issam excerpt output showing gateway resources 

Online IBM.ResourceGroup:ihs-rg Nominal =0nl ine 
|- Online IBM.Appl ication:ihs-rs 

|- Offline IBM.Application:ihs-rs:BI_module_tpl 
Online IBM.Application:ihs-rs:BI_module_tp2 
Online IBM. ServicelP: ihs_192. 168. 182.200 

|- Failed Offline IBM.ServiceIP:ihs_192.168.182.200: BI_module_tpl 
Online IBM.ServiceIP:ihs_192. 168. 182.200: BI_module_tp2 
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If the Bl type 1 node is operational and all equivalencies are online the gateway 
resources, you can start the tailback process using these steps: 

1 . Log in to the Bl type 2 node as the root user. 

2. Verify that the Bl type 2 node currently owns the gateway Service IP using the 
lssam command. 

3. Move the resources contained in the gateway resource group to the Bl type 1 
node: 

cd /root/scripts/tsa 
./failover_sip.sh 

4. Verify that the Bl type 1 node owns both gateway Service IP resource and 
HTTP server resource: 

a. Log in to the Bl type 1 node as the root user. 

b. Issue the 1 ssam command, and verify the following conditions: 

• Bl type 1 node has an Online operational status for both resources. 

• Bl type 2 node has an Offline operational status for both resources. 

• The resource group has an Online operational status. 

c. Use the ifconfig -a command to check if the Bl type 1 node has one 
Ethernet adapter with a second inet entry that corresponds to the gateway 
Service IP address. 

3.6.5 Manual fallback Bl type 2 node 

The IBM Smart Analytics System Bl type 2 node, in normal situations, is holding 
the high availability resources for the IBM Cognos Bl Server application, IBM 
Cognos Content Manager, and the Bl module instance. There can be three 
resource failure scenarios on the Bl type 2 node: 

► System failure of type 2 node: The content store database resource group 
and active IBM Cognos Content Manager are transferred to the Bl type 1 
node. 

► The IBM Cognos Bl Server application failure: The IBM Cognos Content 
Manager on the Bl type 2 node fails and the IBM Cognos Content Manager 
on the Bl type 1 node becomes active. 

► The Bl module instance failure: The databases managed by the Bl module 
instance become inaccessible but the IBM Cognos Content Manager remains 
active on the Bl type 2 node. 
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Depending on the cause of the failure, various failback procedures are applied. 
To find out which component failed on Bl type 2 node, use the following methods: 

► Check the IBM Cognos Content Manager status page: 

Go to the URL: 

http : / /<hostBInode2> : 9081/p2pd/servl et 

Here, <hostBInode2> represents the host name of the Bl type 2 node. 

If the state of the Content Manager is Running, then the Content Manager on 
the Bl type 2 node is the active. Otherwise, either the IBM Cognos Bl Server 
application has failed on the Bl type 2 node, or that the Bl type 2 node has 
experienced a system failure. 

► Check the nominal state for the high availability resources: 

Use Tivoli SA MP command 1 ssam to check the nominal state the resources 
contained in the Bl module instance resource group (db2_coginst_0-rg). If the 
operational state is Failed Offline, then either the Bl module instance has 
failed or the Bl type 2 node has experienced a system failure. If the Bl module 
instance resources have an operational state of Online, then the Cognos Bl 
Server application is the resource that has failed. 

After the failed resources are identified and fixed, the Bl type 2 node is ready to 
reassume its resources. Before proceed with the failback procedure, check if all 
equivalencies are online using the lssam command. Just as with all failback 
operations, this activity must be planned to avoid disrupting the running 
workloads. 

Apply the failback procedures based on the failure occurred and found on the Bl 
type 2 node: 

► If the Bl type 2 node has experienced a system failure, proceed with 
“Designating the active IBM Cognos Content Manager” and “Moving Bl 
module database resources” on page 73. 

► If the IBM Cognos Bl Server has failed on the Bl type 2 node, proceed with 
“Designating the active IBM Cognos Content Manager” on page 72. 

► If the Bl module instance has failed, and the database resources were moved 
to Bl type 1 node, proceed with “Moving Bl module database resources” on 
page 73 to manual failback the resources to Bl type 2 node. 

Designating the active IBM Cognos Content Manager 

When the IBM Cognos Content Manager was failed over to Bl type 1 node, the 
IBM Cognos Content Manager at Bl type 1 node, the primary location, becomes 
standby. To fail the IBM Cognos Content Manager back from Bl type 1 node to Bl 
type 2 node, use the following procedure to designate the active IBM Cognos 
Content Manager: 
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1 . Access the IBM Cognos Connection portal at the URL: 
http://seryice.TP/cognos8 

Here, servicelP represents the IBM Cognos gateway Service IP address. 

2. Click Launch -> IBM Cognos Administration to start the Cognos 
Administration portlet. 

3. Click the System link on the left pane. 

4. Click the host name of the node that you want to designate as the active IBM 
Cognos Content Manager. For example, to designate the Content Manager 
on the Bl type 2 node as the active Content Manager, click 
http://hostBInode2 (where hostBInode2 represents the host name of the 
type 2 node). 

5. Click the dispatcher for the node, which is represented by the suffix 
:9081/pdpd on the URL for the node. For example, the dispatcher for the Bl 
type 2 node is represented as http://hostBlnode2:9081/pdpd. 

6. Click the ContentManagerService entry. 

7. Click the Action drop-down arrow next to the ContentManagerService entry 
identified in the previous step. 

8. Click Activate to activate the Content Manager on the type 2 node. If you do 
not see the Activate link, the Content Manager on the type 2 node is already 
active. 

9. Check the status page for both Bl nodes, and confirm that the State for the 
Content Manager is displayed as Running on the Bl type 2 node, and is 
displayed as Running as Standby on the Bl type 1 node. 


Caution: When performing a failback, do not select “Set as active by default”. 
This option will make the designated Content Manager assume the role of the 
active Content Manager when the type 2 node is brought online. It will make 
the failback of the Content Manager automatic and will result in a disruption of 
the running workloads. 


Moving Bl module database resources 

Follow these steps to move the Bl module database resources back to its 
primary location, the Bl type 2 node from Bl type 1 node: 

1 . Log in to the Bl type 1 node or type 2 as the root user. 

2. Move the resources contained in the content store resource group to the Bl 
type 2 node: 

cd /root/scripts/tsa 
./failover_db2.sh 
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3. Verify that the Bl type 2 node has the resources contained in the Bl module 
database resource group including the Bl module database Service IP 
resource, Bl module instance, and resources for the /cogfs file system. 

a. Issue the 1 ssam command and verify: 

• The Bl type 2 node has an Online operational status for all resources. 

• The Bl type 1 node has an Offline operational status for all resources. 

• The Bl module database resource group has an Online operational 
status. 

b. Issue the ifconfig -a command to verify that the type 2 node has one 
adapter with the second inet entry that corresponds to the content store 
Service IP address. 

Troubleshooting connections after failover of Bl type 2 node 

The IBM Smart Analytics System Bl type 2 node holds the Content Store 
database. In case of the Bl type 2 node experiences a system failure situation 
and connections to the content store remains open, you might not be able to log 
onto the IBM Cognos Connection Portal. To work around this situation, perform 
the following steps: 

1 . Stop the application server that is running on all Bl nodes using the cognos 
user: 

cd /opt/IBM/WebSphere_7/AppServer/prof i 1 es/AppSrvOl/bi n 
./stopServer.sh serverl 

2. Log on to the Bl type 1 node as the DB2 instance owner coginst, and check if 
there are remain connections: 

db2 list applications for database csdb 
Here, csdb is the Content Store database. 

If there are connections, force all applications connected off the csdb. 

3. Restart the application server first on the Bl type 1 node, then on the 
Bl type 2 node, followed by the Bl extension nodes (if existing): 

cd /opt/IBM/WebSphere_7/AppServer/prof i 1 es/AppSrvOl/bi n 
./startServer.sh serverl 

Test the connection on the IBM Cognos Connection Portal, if the problem 
persists, more troubleshooting must be performed. For further assistance, 
contact IBM Customer Support for the IBM Smart Analytics System. 

For further information about the topics discussed in this section, see the IBM 
Smart Analytics System User’s Guide for your respective version. 
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Maintenance 


Similar to other systems, the IBM Smart Analytics System requires maintenance. 
When an IBM Smart Analytics System is assembled at the IBM Customer 
Service Center (CSC), all hardware components with firmware are verified for the 
right release level and updated as required. Every selected software and 
firmware level are tested as an integrated stack. 

IBM periodically tests new firmware or software versions as part of a validated 
stack for the IBM Smart Analytics System. It is important to follow the IBM 
specifications on upgrades to the IBM Smart Analytics System so that your 
system remains on a validated stack level. 

In this chapter we discuss the maintenance procedures to be followed for the IBM 
Smart Analytics System, including the backup and recovery strategies for key 
components such as the database and the operating system. 
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4.1 Managing DB2 message logs 


To help you administer and monitor the database activities, DB2 provides 
diagnostic files, notification files, error logs, trap files, and dump files. For 
example, DB2 logs the database activity messages to the db2diag log file, and 
when a problem occurs, many diagnostic files might be created in the db2dump 
directory. By default, DB2 does not delete these log files, you have to manage 
them to prevent these log files from taking up too much disk space. 

In this section, we discuss the tools and tips for managing the DB2 logs. 

4.1 .1 The db2dback shell script 

The db2dback script is a shell script that allows you to back up the DB2 diagnostic 
data from the diagpath directory to a specified destination. This script works on 
both single database partition and multi-database partition environments with 
rotating and non-rotating logs. You must run this script on the first administration 
node. It will connect to all database partitions and archives the diagnostic and 
message data to a specified file system. This script supports both AIX and Linux 
environments. 

In addition to archiving the diagnostic data, db2back also allows you to maintain 
the data that has already been archived at the destination directory. The archived 
files might be compressed (an option) and purged after a specified number of 
days. 

For example, to archive and compress logs that are older then three days and 
delete if older than seven days, use this command: 

./db2dback.ksh -a tz 3 -r 7 

For more details about this command and to download the db2dback.ksh shell 
script, see this developerWorks article, Archive and maintain DB2 message logs 
and diagnostic data with db2back, located at this address: 
http : //www. i bm.com/devel operworks/data/1 i brary/techarti cl e/dm-0904db2me 
ssagel ogs/i ndex . html ?ca=dth-grn&ca=dgp-my 

4.1.2 db2support -archive 

The db2support problem analysis and environment collection tool has a new 
feature to help managing DB2 logs, the -archive option. This option, introduced 
in DB2 9.7 Fix Pack 1 , creates a copy of the contents of the log directory 
specified by the diagpath database manager configuration parameter into an 
archive path that you specify. 
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The naming convention of the archive directory is DB2DUMP_<hosf 
name >_< current timestamps>. All files under the source log directory are deleted 
after archive. 

Example 4-1 shows how to use the db2support -A command or the 
db2support -archive command to archive the DB2 diagnostic log files to the 
/db2home/bculinux/SLCF/ directory. 

Example 4- 1 db2support output 

bculinux@ISAS56RlDl:~> db2support -A /db2home/bcul inux/SLCF 


D B 2 Support 

This program generates information about a DB2 server, including information about its 
configuration and system environment. The output of this program will be stored in a 
file named 'db2support.zip', located in the directory specified on the application 
command line. If you are experiencing problems with DB2, it may help if this program is 
run while the problem is occurring. 

NOTES: 

1. By default, this program will not capture any user data from tables or logs to 
protect the security of your data. 

2. For best results, run this program using an ID having SYSADM authority. 

3. On Windows systems you should run this utility from a db2 command session. 

4. Data collected from this program will be from the machine where this program runs. 
In a client-server environment, database-related information will be from the 
machine where the database resides via an instance attachment or connection to the 
database. 


Attempting to archive files from DIAGPATH ''/db2fs/bcul inux/db2dump" . 


DIAGPATH data have been successfully archived into 
"/db2home/bcul i nux/SLCF/DB2DUMP_ISAS56R!Dl_2010- 10-06-2 1.59. 31" . 


4.1.3 The db2diag utility 

Though the db2diag log files are intended for use by IBM Software Support for 
troubleshooting purposes. DB2 database administrators frequently check the 
db2diag log files for system messages. In a partitioned environment such as the 
IBM Smart Analytics System, every administration node and data node has its 
own db2diag log files. 
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Finding information in multiple error log files can become time consuming work. 
The db2diag utility is meant to filter and format messages in the db2diag log files. 
These two options can be helpful for combining the db2diag log files: 

► -global : This option includes all the db2diag log files from all the database 
partitions on all the hosts in the log file processing. 

► -merge: This option merges diagnostic log files and sorts the records based 
on the timestamp. 

Specifying the -global and -merge options together consolidates all the db2diag 
log files and sorts the records based on the timestamp. Both options support 
rotating diagnostic log files and files located in split diagnostic data directories. 

You must be the instance owner to run db2diag from the administration node. 

The following db2diag command example searches all the data nodes for the 
db2diag log files, extract any messages classified as severe, and writes them to a 
single output file named db2diag.test: 

db2diag -global -merge -sdir /db2home/bcul inux -1 severe > ./db2diag.test 
Example 4-2 shows an excerpt of the combined db2diag log file. 

Example 4-2 Contents of the combined db2diag log file 

2010-09-22-13.15.29.668213-300 I1E1563 LEVEL: Event 

PID : 26707 TID : 47168874669168PR0C : db2stop 

INSTANCE: bcul inux NODE : 000 

FUNCTION: DB2 UDB, RAS/PD component, pdLoglnternal , probe:120 

START : New Diagnostic Log file 

DATA #1 : Build Level, 152 bytes 

Instance "bcul inux" uses "64" bits and DB2 code release "SQL09072" 
with level identifier "08030107". 

Informational tokens are "DB2 V9.7.0.2", "sl00514", "IP23089", Fix Pack "2". 

DATA #2 : System Info, 440 bytes 
System: Linux ISAS56R1D1 6 2 x86_64 

CPU: total: 16 online: 16 Cores per socket: 8 Threading degree per core:l 
Physical Memory (MB): total: 64436 free =58831 
Virtual Memory(MB): total =97210 free:91605 
Swap Memory (MB) : total =32774 free:32774 

Kernel Params: msgMaxMessageSize:65536 msgMsgMap: 65536 msgMaxQueueIDs:64512 
msgNumberOf Headers : 65536 msgMaxQueueS i ze : 65536 
msgMaxSegmentSi ze: 16 shmMax: 9223372036854775807 shmMin:l 
shmIDs:16128 shmSegments : 16128 semMap:256000 semIDs:16128 
semNum:256000 semUndo: 256000 semNumPerID:250 sem0ps:32 
semUndoSize:20 semMaxVal : 32767 semAdjustOnExit: 32767 
Cur cpu time limit (seconds) = OxFFFFFFFF 
Cur file size limit (bytes) = OxFFFFFFFF 
Cur data size (bytes) = OxFFFFFFFF 
Cur stack size (bytes) = 0x00800000 
Cur core size (bytes) = 0x00000000 
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Cur memory size (bytes) = OxFFFFFFFF 
nofiles (descriptors) = 0x00000800 


You also can filter the message and format the text of the combined log file. To 
select just messages classified as error, severe, or critical, and write them to a 
single output file named db2diag.<current_date>.out, use the following 
command: 

db2diag -global -merge -sdir /db2home/ instance_name -level 
Error, Severe, Critical -fmt "%{ts}\tSeveri ty: %{level} \nlnstance: 
%{inst}\tNode:%{node}\nFunction: %{function}\nError: %{msg}\nDescription: 
%{rc}\n" > /db2home/ins£av?ce_ncrme/db2diag. 'date +%Y%m%d' .out 

Example 4-3 shows an excerpt of the formatted db2diag file content. 

Example 4-3 Formatted db2diag file contents 

2010-09-22-13.15.50.555028 Severity: Error 
Instance: bculinux Node:001 

Function: DB2 UDB, common communication, sql cctcpconnmgr, probe:50 
Error: ADM7006E The SVCENAME DBM configuration parameter was not 

configured. Update the SVCENAME configuration parameter using the 
service name defined in the TCP/IP services file. 

Description: 

2010-09-22-13.15.52.365632 Severity: Error 

Instance: bculinux Node:005 

Function: DB2 UDB, common communication, sql cctcpconnmgr, probe:50 
Error: ADM7006E The SVCENAME DBM configuration parameter was not 

configured. Update the SVCENAME configuration parameter using the 
service name defined in the TCP/IP services file. 

Description: 

2010-09-22-13.16.09.155469 Severity: Error 

Instance: bculinux Node:000 

Function: DB2 UDB, common communication, sql cctcpconnmgr, probe:50 
Error: ADM7006E The SVCENAME DBM configuration parameter was not 

configured. Update the SVCENAME configuration parameter using the 
service name defined in the TCP/IP services file. 


For more details on db2diag utility, see DB2 Information Center at: 

http : //publ ib.boulder.ibm.com/infocenter/db21uw/v9r7/index. jsp?topic=/c 

om. i bm.db2 . 1 uw. admi n . trb . doc/doc/c0020701 . html 
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4.2 Changing the date and time 


An IBM Smart Analytics System is comprised of many servers running copies of 
a DB2 relational database. Certain subdirectories in the administration node are 
mounted in all other DB2 nodes (data nodes, user nodes, and failover nodes) 
through NFS or GPFS mount points, for example, /home or/db2home. 

A few software and operating system components of the IBM Smart Analytics 
System require that the clocks on all the involved servers are synchronized. 
These components include software such as DB2 partitioned database, Tivoli 
System Automation for Multiplatform, NFS, and GPFS. 


Date and time: The date and time settings for all IBM Smart Analytics System 
servers must be the same, within a few minutes of tolerance. Otherwise, the 
NFS and the GPFS directories will be mounted but inaccessible. 


To run commands across all servers, you can build a script as shown in 
Example 4-4. The password-less ssh is set up for the root user between all nodes 
in the cluster, and for the DB2 instance owner for all nodes in the core warehouse 
instance. 

Example 4-4 Running command across all IBM Smart Analytics System Servers 

for i in IBMSMAS56R1ADM1 IBMSMAS56R1DTA1 IBMSMAS56R1DTA2 IBMSMAS56R1STDB1 
do 

echo "\nRunning command $1 on NODE $i \n" 
ssh $i "$1" 
done 


Here, IBMSMAS56R1 ADM1, IBMSMAS56R1 DTA1 , IBMSMAS56R1 DTA2, and 
IBMSMAS56R1 STDB1 are the host names for the servers belonging to the IBM 
Smart Analytics System installation. 

On the example, the file was saved as run_cmd.sh under /BCU_share directory. 
This directory is an NFS file system mounted across all IBM Smart Analytics 
System servers. 

Be sure to test if the script is properly configured with a simple command such as 
date or uptime. Run the command and carefully check if all the IBM Smart 
Analytics System servers are listed, including the management node, 
administration node, data nodes, user nodes, standby nodes, warehouse 
applications node, warehouse OLAP node, and business intelligence nodes. 
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Example 4-5 shows how to test the run_cmd.sh script with date. 
Example 4-5 Testing the run_cmd. sh script 
ISAS56MGMT:/BCU_share # ./run_cmd.sh "date" 


To change the date and time for an IBM Smart Analytics System, follow these 
steps: 

1 . Stop all activities on business intelligence (Bl) module and warehouse 
application module. 

2. Stop DB2 Performance Expert. 

3. Stop all user and application connections to the database. Then deactivate 
the database. 

4. Optional: Back up, then delete db2diag.log and notify.log. 

5. If HA is implemented, stop the DB2 resources using the hastopdb2 command. 
Verify that the resources are offline. If HA is not implemented, stop the 
instance using the db2stop command. 

6. Verify that the application servers resources are offline. 

7. Verify that the Bl module resources are offline. 

Important: Do not change the date when the system is running with more 
than one user. For more details, see this address: 
http : //publ ib.boulder.ibm.com/infocenter/aix/v6rl/index.jsp7topic 
=/com. i bm . ai x . baseadmn/doc/baseadmndi ta/datecommand . htm 

8. Unmount the NFS or GPFS shared subdirectories. 

9. Update the date and time of the management node using the date command. 
For example, to set the date to December 25 14:53:00 (2010), use the 
command date 1225145310. 

10. Update the date and time on all other IBM Smart Analytics System servers. 
Example 4-6 shows how to update the date and time using the run_cmd.sh 
script. 

Example 4-6 Updating the date and time using run_cmd. sh script 
ISAS56MGMT:BCU_share # ./run_cmd.sh “date 1225145310” 


1 1 .List the date and time for all servers and check the output carefully to make 
sure they are all synchronized to the same minute. Example 4-7 shows how to 
check the date and time. 
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Example 4-7 Checking the updated date and time 
ISAS56MGMT:/BCU_share # ./run_cmd.sh “date” 


12. Mount the NFS or GPFS subdirectories. 

13. Verify the changes. 

As the instance owner, change the current directory to /db2home and list its 
contents using the Is command. If the command prompt hangs, then one or 
more servers are not at the same date and time as the other servers. 
Unmount the NFS or GPFS subdirectories again and check the date and time 
on all the DB2 servers. Correct this problem before proceeding. 

Mount the NFS or GPFS subdirectories again. If you can now list the contents 
of the /db2home directory, the date and time of all DB2 Servers are updated 
and correct. The NFS or GPFS subdirectories are also properly mounted and 
the IBM Smart Analytics System can be put back to work: 

a. If HA is implemented, start the DB2 resources. If HA is not implemented, 
start the DB2 instance. 

b. Activate the database. 

c. Start the DB2 Performance Expert at the management node. 

d. Start the resources in the warehouse applications module and the Bl 
module. 

14. Take a full database backup. 

It is best to take a full database backup after verification because as this can 
compromise the transaction log files for rollforward depending on the date and 
time change. 


4.3 IBM Smart Analytics System upgrades 

In this section we discuss the upgrade procedures for firmware and software in 
the IBM Smart Analytics System family offering. 

4.3.1 IBM Smart Analytics System software and firmware stacks 

The IBM Smart Analytics System and InfoSphere Balanced Warehouse® 
validated stack pages website provides links to all validated software and 
firmware stack pages for IBM Smart Analytics System and InfoSphere Balanced 
Warehouse configurations. Consult this site for the information about the 
firmware at this address: 

http://www-01.ibm.com/support/docview.wss?rs=0&uid=swg2 1429594 
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It is best to keep your IBM Smart Analytics System on a validated stack level. 

Do not apply a fix pack just because it is the latest release. Doing so will cause 
your system to move off of a validated stack and you might experience problems 
by using an untested configuration. 

If you must deviate from the validated firmware and software stack, see the 
Frequently Asked Questions about software and firmware upgrades for the IBM 
Smart Analytics System and the InfoSphere Balanced Warehouse at this 
address: 

http://www-01.ibm.com/support/docview.wss?rs=3354&uid=swg2 1328726 


4.3.2 The Dynamic System Analysis tool 

To collect software and firmware information about the Linux-based IBM Smart 
Analytics System offerings, use the Dynamic System Analysis (DSA) tool. 

The DSA is usually already on the management node at the /opt/IBM/DSA 
directory. If it is not installed or if it is an older version, download the new version 
from IBM. Make sure to select the installable version of DSA, not the portable 
version. Download the software and the User’s Guide from this address: 
http://publib.boulder.ibm.com/infocenter/toolsctr/vlrO/index.jsp?topic= 
/dsa/dsa_main.html 

To run DSA, log on as the root user on the management node. The DSA 
command that generates report is collectall that, if run without any options, 
generates a compressed XML file to be sent to IBM support. 

To obtain help about the command enter this command collectall -h or 
collectall -?. 

To change the default report from compressed XML file to HTML format, navigate 
to the /opt/IBM/DSA directory and run this command: col 1 ectal 1 -x -v 

The parameters on the command provide this function: 

► -x: Suppress the compressed XML file. 

► -v: Generate HTML output. 

Example 4-8 shows how to generate a software and firmware report using the 
DSA collectall command. 
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Example 4-8 Generating a software and firmware report 


I SAS56R1D1 : /opt/ I BM/DSA # ./collectall -x -v 
Dynamic System Analysis Version 3.02.56 

Logging console output to file 

/var/log/IBM_Support/DSA_Output_I SAS56RlDl_20101004-202326.txt 
Logging level set to Status. 

Running DSA collector providers pass 1. 

1 ibasuprovider: Advanced Setting Uti 1 i ty(ASU) Setting Collector 
libbist: BIST 

1 ibdiskmgt: Disk Management Information Collector 


The generated report files are under the subdirectories of the default DSA output 
directory /var/log/IBM_Support. Example 4-9 shows the directories and files of 
the DSA report. 


Example 4-9 DSA report directories and files 


ISAS56RlDl:/var/l og/IBM_Support it Is -1 
total 8 

drwxr-xr-x 2 root root 4096 2010-10-04 20:26 7947ACl_KQXFCGM_20101004-202326 
- rw -r— r— 1 root root 347 2010-10-04 20:13 DSA_Output_ISAS56RlDl_20101004-201325.txt 
ISAS56RlDl:/var/log/IBM_Support # 

ISAS56RlDl:/var/l og/IBM_Support # cd 7947ACl_KQXFCGM_20101004-202326 
ISAS56RlDl:/var/l og/IBM_Support/7947ACl_KQXFCGM_20101004-202326 # Is -la 
total 19732 


drwxr-xr-x 2 root root 
drwxr-xr-x 3 root root 
-rwxr-xr-x 1 root root 
-rwxr-xr-x 1 root root 
-rw-r--r-- 1 root root 
-rwxr-xr-x 1 root root 
-rwxr-xr-x 1 root root 
-rw-r--r-- 1 root root 
-rw-r--r-- 1 root root 
-rwxr-xr-x 1 root root 


4096 2010-10-04 20:26 . 

4096 2010-10-04 20:51 .. 

4941 2010-10-04 20:26 banner_left.jpg 
9744 2010-10-04 20:26 banner_right.jpg 
1556 2010-10-04 20:26 bist.html.html 
509 2010-10-04 20:26 cal 1. jpg 
59936 2010-10-04 20:26 cal endarPopup. js 
52634 2010-10-04 20:26 chassis_event.html 
252 2010-10-04 20:26 diags.html 
35551 2010-10-04 20:26 dom.js 


-rw-r— r-- 1 root root 718 2010-10-04 20:26 index.html 


In this example, a subdirectory named 7947AC1_KQXFCGM_201 01 004-202326 
was generated. Copy the whole directory to a Windows client, open the 
index.html file with a web browser. The output is detailed and easy to use, click 
the link for the information you want to know more about. Figure 4-1 shows a 
Dynamic Systems Analysis HTML report. 
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Figure 4- 1 Dynamic Systems Analysis HTML report. 


4.3.3 IBM Smart Analytics System Control Console 

The IBM Smart Analytics System Control Console provides systems 
management capability for the IBM Smart Analytics System. The console can 
update firmware and software by installing fix packs downloaded from Fix 
Central. The IBM Smart Analytics System Control Console also allows you to 
view detailed information about the hardware and software components in the 
system, and to change passwords for specific users across the entire system. 

The IBM Smart Analytics System Control Console supports certain 5600, 7600, 
and 7700 offerings. For the details about IBM Smart Analytics System Control 
Console, see the IBM Smart Analytics System User’s Guide for your respective 
version. 
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4.4 IBM HealthCheck Service 


The IBM HealthCheck Service is designed to ensure that an IBM Smart Analytics 
System is still performing at its optimal level. This service must be carried out by 
the IBM services team after the sixth and twelfth months of an IBM Smart 
Analytics System installation. After that, it must be run once a year. 

The HealthCheck Service provides an in-depth IBM Smart Analytics System 
analysis on the following areas: 

► Overall IBM Smart Analytics System configuration review: 

- Adherence to standard IBM Smart Analytics System methodology 

- Operating system, database, database tools and storage subsystem levels 

► Operating System review: 

- Conformance to IBM Smart Analytics design 

- Error logs 

► DB2 instance review: 

- Conformance to IBM Smart Analytics methodology 

- DB2 instance-level settings 

- Instance level error logs 

► DB2 database review: 

- Conformance to IBM Smart Analytics methodology 

- Object layout and definition 

- Use of DB2 new features 

- DB2 database-level settings 

► Hardware management review: 

- Logged system events 

- Customer notification settings 

- Call home settings 

► Storage Subsystem review: 

- Profile and configuration conformance to IBM Smart Analytics 
methodology 

- Storage Subsystem event logs 

- Firmware level 

► Operational considerations: 

Monitoring of database during peak and off-peak times 
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For more information about IBM HealthCheck Service, visit this address: 

http : //publ ic. dhe.ibm.com/sof tware/data/sw-1 ibrary/services/ Inf oSphere_ 

Warehouse_HealthCheck.pdf 


4.5 IBM Smart Analytics System installation report 

When an IBM Smart Analytics System installation and configuration is completed 
at the IBM Customer Solution Center, an installation report is created. This report 
includes system information and the results of the quality assurance tests 
performed. The same tests are run again at the customer site as part of the 
deployment, and the test results are added to the installation report. Customers 
are encouraged to read this report and use it as a reference in case information 
about the IBM Smart Analytics System is needed. It contains detailed information 
about the entire system. 

Figure 4-2 shows the front page of an installation report. 
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You can find details about system configuration and test results in the area 
shown in Figure 4-3. The initial AIX and DB2 configurations, as well as the high 
availability configurations, are listed. 


3 Table of Contents 

Customer Information 
Customer Worksheet 
Architecture and Hardware Profile 
Rack Diagram 
Software Stack 
Network Configuration 
Network Point-to-point Diagram 
Storage Subsystem Configuration 
Fibre Point-to-point Diagram 
AIX Configuration 
File System 
DB2 Configuration 
DB2 HA Configuration 

InfoSphere Warehouse Application Server HA Configuration 
Network Throughput Validation 
I/O Performance Validation 
DB2 Performance Validation 
High Availability Test 
System Certification 
References and Next Steps 


Figure 4-3 Installation report table of contents 


4.6 IBM Smart Analytics System backup and recovery 

Data is one of the critical assets for an enterprise. Ensuring the availability of this 
asset is the most important matter when architecting a data warehouse 
environment. Implementing an efficient backup and recovery strategy that meets 
the business service level agreements becomes an essential task in designing a 
data warehouse environment. 

The IBM Smart Analytics System consists of various components, all of which 
need to be considered in the backup and recovery strategy in order to provide 
optimal protection of data in the data warehouse environment. In this section we 
discuss the backup and recovery plans and strategies for the IBM Smart 
Analytics System. For further references about backup and recovery strategies in 
IBM Smart Analytics System environments, see the IBM Smart Analytics System 
User’s Guide tor your respective version and the best practices documents that 
are available at IBM developerWorks®: 
http://www.ibm.com/developerworks/data/bestpractices/ 
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4.6.1 Operating system backup and recovery 

In this section we discuss techniques and considerations for backing up and 
recovering the operating system of the individual nodes in the IBM Smart 
Analytics System. The methods used vary for AIX and Linux-based systems, as 
the two platforms have quite unique backup and recovery methods available. 

AIX 

IBM Smart Analytics System 7600 and 7700 are AlX-based systems. In this 
section, we discuss AIX backup and recovery techniques. 

Backup 

For AIX systems, the operating system provides a very mature and reliable 
backup and recovery method in the mksysb utility. This command allows creation 
of a bootable backup image of an AIX system that can be written to tape, CD, or 
file. A good way to use mksysb in an AIX based IBM Smart Analytics System is to 
create mksysb backup files of each node in the cluster that can then be used to 
perform a network boot to recover the contents of the node if required. 

The management node in an AIX based IBM Smart Analytics System is 
configured as a network installation manager (NIM) server, and can be used to 
restore any of the other nodes in the cluster using mksysb backup images. For 
this reason, the good place to store the mksysb backups created for the various 
nodes is on the management node. This activity is done so that they are easily 
accessible in the event of a restore. 

The mksysb command must be run on the node being backed up. The image can 
be written either to a local file then transferred to the management node, or to a 
remote file system on the management node that is GPFS or NFS mounted 
locally. 

We suggest that a script be written to automate the creation of mksysb backup 
images for all nodes, and scheduled to run regularly (weekly or monthly) through 
cron so that a reasonably up-to-date mksysb image for all nodes is always 
available. 

Create a new file system named /sysbackup on the management node to 
accommodate the mksysb backup files created on all nodes. Exporting this file 
system with GPFS or NFS gives an easy way to write mksysb files created on 
each node to a central location on the management node. 

The /etc/exclude. rootvg file must be created on each server and populated with a 
list of directories in the root volume group which must not be backed up. 
Example 4-10 shows a sample exclude. rootvg file. 
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Example 4-10 Sample exclude, rootvg file 

/sysbackup/ 

/tmp/ 

/var/tmp/ 

/etc/vg/ 


The exclusive file list must be expanded to include any other file systems in your 
environment that are not required to be included in the mksysb backup. As a 
general guideline, any large or frequently changing file systems in the root 
volume group will be better backed up to Tivoli Storage Manager and excluded 
from the mksysb. 

Example 4-1 1 shows a simple backup script which can be run from the 
management node to create a mksysb backup on each node and write the backup 
image to the NFS mounted /sysbackup file system on the management node. 
This script assumes that ssh keys are in place to allow passwordless access by 
root from the management node, which must be the case on most IBM Smart 
Analytics System. The script also removes mksysb backups for each node that 
are older than 1 5 days (assuming weekly mksysb backups, this will leave at least 
two copies). 

Example 4- 1 1 Backup script 
#!/bin/ksh 

for NODE in 'lsnode' 
do 

ssh $N0DE “mksysb -ie /sysbackup/${NODE}_'date +%d_%b_%Y'. mksysb” 

# Keep mksysb backups older than 15 days 

find /sysbackup -name “${N0DE}*. mksysb” -mtime +15 -exec rm {} \; 
done 


This example is very simple and in a production environment has to be expanded 
to include error handling. For example, the NFS file system is mounted, the 
directory has sufficient space for the mksysb, and the mksysb command 
completes successfully. It is also wise to make the housekeeping command 
dependent on successful completion of the mksysb backup. 

The /sysbackup file system must be included in your Tivoli Storage Manager file 
system backups to ensure that offline copies of the mksysb backups are available 
in the event of problems with the management node or its disks. 

Restore 

Restoring a mksysb backup from the management node to one of the nodes 
entails preparation and configuration work on the Management node, configured 
as a NIM master, then initiating a network boot on the node to be restored. 
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The restore steps are as follows: 

1 . Check if the server is an NIM server. 

Use the 1 snim command to check if the server you are restoring is defined to 
the NIM server and ready for a NIM operation. Example 4-12 shows a NIM 
machine status listing. 


Example 4-12 Listing a NIM machine status 

[mgmtnode:root:/home/root:] lsnim -1 datanodel 
datanodel: 


class 

type 

connect 

platform 

netboot_kernel 

ifl 

cabl e_typel 
estate 
prev_state 
Mstate 

Cstate_result 


machines 

standalone 

nimsh 

chrp 

mp 

= Network-1 datanodel 0 
= bnc 

= ready for a NIM operation 
= BOS installation has been enabled 
not running 
reset 


The output in Example 4-12 shows that datanodel is defined to the NIM 

master as a machine, and is ready to start an operation (Cstate = ready 

for a NIM operation). If the status was not showing as ready, for example, 

if a previous NIM operation had failed, then reset the definitions using the 

following commands: 

nim -Fo reset datanodel 

nim -o deallocate -a subclass=all datanodel 

These commands reset any current NIM operations for the server and 

deallocate any resources that might remain assigned to the server. In the 

event of problems with the NIM restore, always clear any previous restore 

attempts with these commands before trying again. 

2. Define the mksysb resource to the NIM server. 

Use smitty or run commands from the command line to define the mksysb 
resource to the NIM server. To define the mksysb using smitty, use command: 
smitty nimjnkres 

This command brings up the Define a Resource panel. Select mksysb as the 
resource type, then fill in a name for the mksysb resource, master for the 
server name, the location of the file, and optionally a descriptive comment 
(see Example 4-13). 
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Example 4- 1 3 Define a mksysb resource smitty panel 


Define a Resource 


Type or select values in entry fields. 

Press Enter AFTER making all desired changes. 


* Resource Name 

* Resource Type 

* Server of Resource 

* Location of Resource 

NFS Client Security Method 
NFS Version Access 
Comments 


[Entry Fields] 

[datanodel_mksysb] 

mksysb 

[master] + 

[/sysbackup/datanodel_15oct2010. mksysb] / 

[] + 

[] + 

[Mksysb resource for datanodel] 


3. Create SPOT resource. 

After the mksysb resource has been defined, you have to create an associated 
shared product object tree (SPOT) resource before the mksysb resource can 
be used for a network boot. Either create a subdirectoy of the /sysbackup file 
system to contain the SPOTs, or create a new file system. Use command 
smitty nimjnkres again, and select a resource type of spot. Complete the 
panel by specifying an appropriate name for the spot resource, master for the 
server, and the mksysb resource you have just defined as the source for the 
install images (shown in Example 4-14). 

Example 4- 14 Define a spot resource smit panel 

Define a Resource 


Type or select values in entry fields. 

Press Enter AFTER making all desired changes. 



[Entry Fields] 

Resource Name 

[datanodel_spot] 

Resource Type 

spot 

Server of Resource 

[master] + 

Source of Install Images 

[nxtsmprod_mksysb] + 

Location of Resource 

[/sysbackup/spot/datanodel spot] / 

NFS Client Security Method 

[] 

NFS Version Access 

[] + 

Expand file systems if space needed? 

yes + 

Comments 

[Spot created from datanodel mksysb image] 


4. Set up the NIM master. 

To set up the NIM master to enable a network boot using the mksysb image, 
perform these steps: 

a. Run the smitty command: smitty nim_bosinst. 

b. Select the server to be restored. 
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c. Select mksysb as the installation type. 

d. Select the mksysb resource you have just defined. 

e. Select the spot resource you have just defined. 

f. Change “Initiate reboot and installation now?” to No. 

g. Press Enter to set the network boot up. 

5. Boot the LPAR. 

Perform the following steps to boot the LPAR being restored to system 
management services (SMS), configure the correct client and server IP 
addresses, and boot from the appropriate network adapter. The LPAR that 
you are restoring has to be inactive: 

a. Log on to the Hardware Management Console (HMC) web interface as the 
hscroot user 

b. In the left hand panel, select Systems Management ->• Servers, then the 
correct managed system. 

c. In the main panel, select the correct logical partition, and in the tasks list 
next, select Operations Activate -> Profile. 

d. Check the option to open a console window, and change the boot mode in 
the advanced options to SMS. 

e. Select OK to start the LPAR booting. 

f. When the console windows displays the SMS menu, select the option to 
configure remote IPL, then select the correct network adapter. 

g. Select BOOTP as the network service. 

h. Configure the client IP address as the management IP address of the 
LPAR being restored. 

i. Configure the server IP address as the management IP address of the 
management node. 

j. Perform a ping test to ensure that the IP addresses configured are working 
correctly. 

k. Click Escape until you return to the main SMS menu. 

l. Select the Select Boot Options item from the menu. 

m. Select the Select Install or Boot Device item from the menu. 

n. Select Network as the device type. 

o. Select the network adapter which you configured with an IP address 
earlier. 

p. Select Normal Boot Mode. 

q. Confirm your selection to exit and start the network boot. 
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The LPAR is now performing a network boot from the management node's 
NIM server. After the LPAR has contacted the NIM server and started booting 
the mksysb image, prompts will appear on the console, which you will need to 
respond to in order to start the restore: 

- When prompted, confirm the console to be used. 

- When prompted, confirm the language to be used during installation. 

- When the “Welcome to Base Operating System Installation and 
Maintenance” panel appears, select option 2 to change settings. 

- The “System Backup Installation and Setting” pane appears. Confirm the 
details listed and click 0 to start the installation. 

6. The system will now restore the LPAR from the mksysb image. After the 
restore has completed, log on to the LPAR and confirm the restore has 
completed successfully. 

For further information about the process of booting from NIM and performing 
mksysb restores, see the AIX Installation and Migration Guide, SC23-6616-04 at 
the following link 

http://publib.boulder.ibm.com/infocenter/aix/v6rl/topic/com.ibm.aix.ins 
tal 1 /doc/i nsgdrf/i nsgdrf_pdf.pdf 

You can learn the best practice of AIX backup and restore from the 
developerWork article Best practices: AIX operating system-level backup and 
recovery for an IBM Smart Analytics System at: 

http : //www. i bm.com/devel operworks/data/bestpracti ces/smartanalyti cs/osb 
ackup/index.html 

Linux 

For Linux-based IBM Smart Analytics System, the operating system does not 
provide an equivalent function to the AIX mksysb, so the operating system backup 
and recovery is not as straightforward. There are various options available for 
performing backup and recovery on a Linux system, each with advantages and 
disadvantages. Here we discuss the available options, but you need to test the 
solution you decide upon in your own environment and discuss with your IBM 
support representative. 

Tivoli Storage Manager 

IBM Tivoli Storage Manager (TSM) protects and manages a broad range of data, 
from workstations to the corporate server environment. The centralized, 
policy-based backup and recovery function is ideal for backup and restore Linux 
system files. 
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Backup 

Tivoli Storage Manager (TSM) can be used to back up all files on the server, 
including those belonging to the operating system. Use Tivoli Storage Manager 
to perform incremental backup of all file systems daily. 

Keep database backups separated from file system backups on the Tivoli 
Storage Manager server by creating two nodes for each server, one for file 
system backups and one to be used by DB2 to back up the database. The Tivoli 
Storage Manager client has to be configured such that the database file systems 
are excluded from the file system backup. You can achieve this task by creating 
an include/exclude file containing entries for each of the file systems. This file 
also has to exclude the /db2home and /home file systems if they are mounted 
over NFS. 

Example 4-15 shows a sample include/exclude file. 

Example 4-15 Sample Tivoli Storage Manager include/exclude file contents 

excl ude. fs "/db2fs/bcul i nux/NODEOOOl" 
excl ude. fs "/db2fs/bcul i nux/N0DE0002" 
excl ude. fs "/db2fs/bcul i nux/N0DE0003" 
excl ude. fs "/db2fs/bcul inux/N0DE0004" 
excl ude. fs "/home" 
exclude. fs "/db2home" 


You can schedule Tivoli Storage Manager backups by creating a schedule on the 
Tivoli Storage Manager server, and associate the Tivoli Storage Manager file 
system nodes with the schedule. The clients must then be configured to run the 
Tivoli Storage Manager Client Acceptor Daemon which runs the local Tivoli 
Storage Manager scheduler daemon at regular intervals. The local Tivoli Storage 
Manager scheduler daemon will run the backups at the appropriate times. For 
details about configuring the Tivoli Storage Manager client acceptor daemon, 
see the following web page: 

http: //www-01 . i bm. com/support/docvi ew.wss?rs=663&context=SSGSG7&ql=l i nu 
x+start+dsmcad&uid=swg21240599&loc=en_US&cs=utf-8&lang=en 

Restore 

Although Tivoli Storage Manager provides good facilities for restoring individual 
files, performing a “bare metal” restore of an entire server is more problematic. 
Because Tivoli Storage Manager does not provided any way of booting the 
server from a backup image, alternative methods have to be used. One option is 
to create a custom boot CD which includes the Tivoli Storage Manager client, 
and use it to perform the restore. 
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Although a full explanation of this activity is beyond the scope of this book, here 
are a few considerations: 

► Ensure that you have the volume group, logical volume, and file system layout 
documented in case you need to recreate it. 

► Choose a Linux distribution that allows an easy way to create a custom 
Live-CD so that you can include the Tivoli Storage Manager client. A guide is 
available at this address: 

http : //www. i bm.com/devel operworks/1 i nux/1 i brary/1 -fedora-1 i vecd/ 

► After you have booted from your live CD, mount your recreated file systems at 
temporary mount points (for example, under /mnt) and restore each one from 
Tivoli Storage Manager. 

► If you are recreating your file systems, you must ensure that mount point 
directories are created at the appropriate places in the restored file systems. 
This needs to include mount point directories for dynamic file systems such 
as /proc, /sys, /dev, /dev/pts, because these will not be recreated by the Tivoli 
Storage Manager restore. You can check the /etc/fstab file which you restore 
from Tivoli Storage Manager to ensure that mount points have been created 
for all file systems. 

► Mount the /dev directory from your live CD onto the /dev mount point in the 
restored root file system (mount -bind /dev /mnt/dev). 

► Reinstall the grub boot loader by chroot-ing into the restored file systems 
(chroot /target grub-install /dev/sda). 

If considering this as a restore process, the steps must tested thoroughly and 
expanded into a full recovery procedure. 

Cristie Bare Machine Recovery 

Cristie Data Products, which offers data storage and backup solutions, integrates 
with Tivoli Storage Manager to provide a Bare Machine Recovery (BMR) solution 
for Linux. The Tivoli BMR product is available for resale through IBM. 

All files are backed up to Tivoli Storage Manager, and configuration files are 
created to reflect the disk layout (partitioning, volume groups, logical volumes, file 
systems) of the server. 

Recovery is performed by booting to a recovery environment provided either on a 
bootable CD ISO image, or as a network bootable PXE image. Recreation of the 
disk layout and recovery of the file system data is automated by the Tivoli BMR 
product. For further information about Tivoli MBR, see this IBM web address: 
http://www-01.ibm.com/software/tivol i/products/storage-mgr/cri st ie-bmr. 
html 
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Or, go to the Cristie website at this address: 
http://www.cristie.com/products/tbmr/ 


4.6.2 Database backup and recovery 

The challenge in planning database backup and recovery strategies is to ensure 
continuous data availability and, in the meantime, secure the data to be ready for 
recovery in a data loss or corruption situation. The backup and recovery planning 
must start from the beginning of the project and must consider both the recovery 
point the recovery time objectives. These objectives must be documented and 
used as a starting point to design the backup and recovery strategies. 

The recovery time objective (RTO) is the time expected to recover the database 
from a data loss or corruption situation. It is set from the beginning of the project. 
Based on the data volume and the desired recovery time, the infrastructure must 
be architected to be able to achieve this objective. 

The recovery point objective (RPO) is the minimum point in time to which data 
must be recovered. Lost data beyond this point can be re-loaded from source 
files or be deemed acceptable data loss. This event must be discussed and 
planned with the application owners and business users. The recovery point 
objective affects the database backup plan, database transactional log 
management, and table space design. 

Always have the recovery objectives aligned with the infrastructure to ensure that 
the objectives are reachable. The recovery strategy ties with the data availability 
and drives the backup plan. Online backup is preferable rather than offline 
backup because it provides concurrency with other data warehouse 
operations. You must also consider the granularity of the backup, such as, taking 
advantage of table space level backup to increase the backup and recovery 
speed. 

To take online backup, the database must have archive logging. When 
performing major system or database upgrades, it is advisable to do a full offline 
backup. Unlike a data recovery scenario that might involve the recovery of 
individual table spaces, a full database recovery is more efficient from a full 
offline database backup. 

You can benefit from IBM Tivoli Storage Manager in managing the database and 
table space backup images and the transaction logs associated with backup in 
an IBM Smart Analytics System environment. When using IBM Tivoli Storage 
Manager, plan the storage (disk) pool size and the number of tape drives per IBM 
Smart Analytics System data modules according to the backup and recovery 
time objectives to meet the desired performance and service level agreement. 
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Tapes: To make the backup and recovery process faster, use two tape drives 
for each IBM Smart Analytics System data module. 


The IBM Smart Analytics System has an option to have extra host bus adapter 
(HBA) cards to support LAN-free based backup technologies. The LAN-free 
backup process sends the data directly to a centralized storage device, 
eliminating the traffic created by backup from the corporate network. The 
LAN-free backup is conducted through the corporate Storage Attached Network 
(SAN) to delivery the data directly to the storage and tapes devices. 

Use the LAN-free backup technology for IBM Smart Analytics System to optimize 
the backup and recovery speed, especially when the network is an issue in 
meeting the backup and recovery objectives. This technology is efficient in 
handling large data volume such as database backup but not the small file 
transfer like transaction logs. The transaction log backup can go through the 
corporate network or have a dedicated network. IBM Tivoli Storage Manager can 
manage the backup transferred through either LAN-free devices or through the 
corporate network. 

When backing up transaction log files through Tivoli Storage Manager, the 
storage size must be large enough to accommodate, minimally, the log files from 
the last backup until the next backup. 

More information about planning a backup and recovery strategy, see the 
following documentation: 

► The DB2 Information Center, at this address: 

https ://publ ib. boulder. ibm.com/infocenter/db21uw/v9r7/index.jsp?topi 
c=/com.ibm.db2.1 uw. admin. ha. doc/doc/c0005945.html 

► DB2 best practices: Building a recovery strategy for an IBM Smart Analytics 
System data warehouse at developerWorks: 

http://www.ibm.com/developerworks/data/bestpractices/isasrecovery/in 

dex.html 

► DB2 best practices: DB2 instance recovery for IBM Smart Analytics System 
at developerWorks: 

http : //www. i bm.com/devel operworks/data/1 i brary/techarti cl e/dm-lOlOdb 
2i nstancerecovery/i ndex . html 

Data warehouse database design considerations for recovery 

Database design can affect backup performance and recovery efficiency. How 
data is placed and spread across the table spaces can impact performance of 
applications loading data, as well as the table space level backup and recovery. 
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Create only the table spaces required. An excessive number of table spaces can 
increase complexity and administration tasks. The size of the recovery history 
files can grow such that starting and stopping the database and creating and 
dropping existing table spaces becomes very slow. 

Consider these possibilities when designing a data warehouse database: 

► Classify the data: 

One characteristic of data warehouse environment is the large data volume. 
The update frequency can vary among the data. You can classify your data 
based on the update frequency into active, for frequently updated data; 
warm, for less frequently updated data; and cold, for static or historical data. 
The data is then placed on unique table spaces by these categories. This 
approach allows you to apply particular backup strategies for each group of 
table spaces, for example, back up the table spaces with active data more 
frequently. 

► Partition data by range: 

You also can distribute data using the DB2 range partition feature and place 
range partitioned data in unique table spaces. Evenly distributing the data 
among the database partitions and table partitions improves both data load 
performance and optimizes recovery capabilities. 

When loading data into a range partitioned table, load the data by range to 
facilitates adding and purging data in the data warehouse and take advantage 
of the range partitioned table resources. 

► Staging tables: 

Avoid non-logged data load (by the 1 oad utility, or non-logged inserts) to the 
production tables directly. The non-logged load operation places the table 
space in a backup pending state. Consider performing non-logged load into 
staging tables, then perform a logged insert to the final table, reading from the 
staging tables. It will log the transaction (to allow rollforward recovery) and 
avoid placing the table space in a backup pending state. 

The staging tables must be placed in their own table space because the stage 
tables do not need to be backed up. 

► Referential integrity: 

Consider placing the tables that have referential integrity relationship in the 
same table space. All the parent and child tables will be backed up and 
restored along with the table space. This method can reduce the number of 
table spaces to be recovered compared to spreading the child tables in 
unique table spaces. 
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Backup guidelines 

The purpose of backing up data is to ensure data availability including the time 
taking the backup. Online database and table space backup allows you to 
perform backup without stopping other database activities. Utility throttling 
provides the automatic capability to regulate the resource used by the backup job 
thus minimizing the performance impact on the database. 

Here we list guidelines for the database backup of the IBM Smart Analytics 
System offerings: 

► Perform full database backup on a quarterly basis. Take a full database 
backup before and after the IBM Smart Analytics System expansion and DB2 
software upgrades. 

► Table space level backup takes less time than full database backups. If time is 
a concern, take table space level backup. Consider archiving inactive data. 

► Perform full online table space backups whenever it is possible. If the active 
data in the hot table spaces is often updated, take incremental backup. 
Perform full table space backup (including logs) for the active data at least 
twice a week. 

► The catalog partition must be backed up on a daily basis to ensure that any 
DDL issued is synchronized across data node and administration node 
backups. If the catalog database partition holds a large amount of data, 
perform incremental backups, but perform full table space backup on a daily 
basis for the catalog table space. 

► Back up the catalog partition on a daily basis to ensure that any DDL issued is 
synchronized across data node and administration node backups. 

► Perform a full table space backup when a new table is added. A new table has 
no data, saving time during recover. 

► The point-in-time recovery of a single table space in an individual database 
partition is not possible in a partitioned database environment (DPF). To 
recover a table space to the same point-in-time across all database partitions 
requires rolling forward to the end of log for all database partitions. If you have 
to perform a point-in-time recovery, run a full database recovery instead of a 
table space recovery. 

► When performing database or table space backups on an IBM Smart 
Analytics System environment, run the backup job in parallel across the data 
nodes. For example, perform the backup task for the first database partition 
on each data node in parallel, then when it is finished, start for the second 
database partition and so on. When you have completed the backup of the 
first table space, back up the second table space, then the third, until you 
have backed up all table spaces, and always do one database partition per 
data node at a time, running in parallel across all data nodes. 
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► Back up the sqllib directory on the DB2 instance user home directory using 
the operating system backup mechanism. 

► Back up the data definition language (DDL) of the production database 
structure (DDL). Use db21ook to generate the DDL file after a structure 
change and save the file. Another good practice is to keep the change history 
of the instance and database configuration. 

► On the IBM Smart Analytics System environment, there are other databases 
besides the user database: 

- The IBM InfoSphere Warehouse metadata database (ISWMETA) hosted 
on the warehouse applications nodes 

- The IBM Cognos content store hosted on the Bl nodes 

These databases must also be backed up. See the IBM Smart Analytics 
System User’s Guide for your respective version for the backup procedures 
for these metadata databases. 

Example 4-16 shows a command for performing an online backup of the catalog 
database partition. 

Example 4- 1 6 Catalog database partition backup 

# IBM Smart Analytics System 7600 - Full online catalog partition Backup - Partitions 0 
db2 backup database edwp on dbpartitionnum(O) online use tsm 


Example 4-17 shows how to take online table space backup by database 
partitions in a two-data-node IBM Smart Analytics System 7600 environment. In 
the example, a backup operation is issued to each node in parallel to help ensure 
that there is no skew in performance, which will occur if a backup was issued to 
one node only at a time. 

Example 4- 1 7 Table space level online backup 

# IBM Smart Analytics System 7600 - Tablesapces Backup - Partitions 1 and 5 
backup database edwp on dbpartitionnums (1,5) tablespace(pd_tsl, pd_tslix) 
online use tsm open 2 sessions util_impact_priority 33 include logs 

# IBM Smart Analytics System 7600 - Tablesapces Backup - Partitions 2 and 6 
backup database edwp on dbpartitionnums (2,6) tablespace(pd_tsl, pd_tslix) 
online use tsm open 2 sessions util_impact_priority 33 include logs 

# IBM Smart Analytics System 7600 - Tablesapces Backup - Partitions 3 and 7 
backup database edwp on dbpartitionnums (3,7) tablespace(pd_tsl, pd_tslix) 
online use tsm open 2 sessions util_impact_priority 33 include logs 

# IBM Smart Analytics System 7600 - Tablesapces Backup - Partitions 4 and 8 
backup database edwp on dbpartitionnums (4,8) tablespace(pd_tsl, pd_tslix) 
online use tsm open 2 sessions util_impact_priority 33 include logs 
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Example 4-18 shows how to perform online database backup by database 
partitions in a two-data-node IBM Smart Analytics System 7600 environment. 

Example 4-18 Database level online backup example 

It IBM Smart Analytics System 7600 - Full online database Backup - Partitions 1 and 5 
db2 backup database edwp on dbpartitionnums(l,5) online use tsm 

# IBM Smart Analytics System 7600 - Full online database Backup - Partitions 2 and 6 
db2 backup database edwp on dbpartitionnums(2,6) online use tsm 

# IBM Smart Analytics System 7600 - Full online database Backup - Partitions 3 and 7 
db2 backup database edwp on dbpartitionnums(3,7) online use tsm 

# IBM Smart Analytics System 7600 - Full online database Backup - Partitions 4 and 8 
db2 backup database edwp on dbpartitionnums(4,8) online use tsm 


When backup with the IBM Tivoli Storage Manager, use the db2adutl command 
to query the backup images and log archive. You also can use db2adutl to 
retrieve log files for restore. 

When taking a backup, use the DB2 list utilities command to track the 
progress. 

To check the consistency of a backup image, use the db2ckbkp command to 
verify the backup image integrity and to retrieve the backup image information 
stored in the backup header. 

For more information about DB2 backup command and utilities, see the following 
DB2 documentation: 

► DB2 backup overview: 

https : // publ i b . boul der . i bm. com/i nfocenter/db21 uw/v9r7/i ndex . j sp?topi 
c=/com.ibm.db2.1 uw. admin. ha. doc/doc/c0006150.html 

► DB2 list utilities command: 

https : //publ i b . boul der. i bm. com/i nfocenter/db21 uw/v9r7/i ndex . j sp?topi 
c=/com.ibm.db2.1 uw.admin.cmd. doc/doc/rOOl 1550.html 

► Checking DB2 backup image: 

https : //publ i b . boul der. i bm. com/i nfocenter/db21 uw/v9r7/i ndex . j sp?topi 
c=/com. i bm.db2 . 1 uw. admi n . cmd . doc/doc/r0002585 . html 

► Managing DB2 objects with Tivoli Storage Manager: 

https: //publ ib. boul der. ibm. com/i nfocenter/db21 uw/v9r7/i ndex. jsp?topi 
c=/com .ibm. db2 . 1 uw . admi n . cmd . doc/doc/r0002077 . html 
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Recovery guidelines 

To have an effective and time saving recovery, it is essential to analyze what is 
required for recovery, then decide on the recovery scope and steps. A disaster 
recovery strategy and data recovery strategy must be implemented as separate 
processes. A database backup is best suited to disaster recovery, whereas a 
table space backup strategy is best suited to data recovery. 

Starting from DB2 version 9.5, you can rebuild the database from table space 
level backups. No full database backup is required to rebuild the database. Each 
table space backup image has the entire structure for the database that can be 
used to restore the entire database structure. After the database structure is 
restored, you can restore the table spaces in sequence and roll forward to the 
point of recovery desired. 

When performing a full database restore, start from restoring the catalog partition 
from the latest catalog partition backup, then restore the rest of the database 
partitions. 

Table space recovery works on the individual table space level; it is more 
granular, flexible, and faster than a full database restore. 

DB2 provides tools to help identify the consistency of the data and the database 
structure. You can use these tools to analyze the database and design a suitable 
recovery plan when a recovery is required. 

Use the inspect command to identify where is the corrupted data. The inspect 
command can be performed without deactivate the database. Use the inspect 
command to verify the database architectural integrity and to check the database 
pages for data consistency. You can save the output to a file and format the 
results using the db2inspf command. 

The db2dart command is suitable to verify the architectural correctness of the 
database and the objects within them. db2dart accesses the data and metadata 
in a database by reading them directly from disk, therefore, run this command 
only when there are no active connections on the database. 

After the data corruption scope is identified, you can take a proper recovery 
action: 

► If the corruption occurred on temporary objects such as the stage tables, 
no recovery is required. 

► If the corruption was on a table level, recover only the dropped table from the 
backup. 

► If the corruption occurs on the table space level, a table space restore is 
sufficient. 
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► If the corruption occurs on the database partition level, a full database 
partition restore is required. 

► If the data error is caused by an application, consider the possibility of 
unloading and reloading the data using the application. 

Example 4-19 shows how to restore a table space across all database partitions. 

Example 4-19 Tablespace restore across all database partitions 

It IBM Smart Analytics System 7600 - Table space restore across all database 
partitions 

db2_al 1 "«+l<|| db2 V'restore database edwp tablespace (pd tsl , pd_tslix) 
online use TSM taken at <bkp_timestamp> replace existingV 1 " 

# Once the restore is completed, perform the roll forward 

db2 “roll forward database edwp to end of logs on dbpartitionnum (1 to 8) 
tablespace 

(pd_tsl, pd_tslix) online” 


For more examples and information about DB2 restore command and utilities, 
see the following DB2 documentation: 

► DB2 restore overview: 

https : //publ i b. boul der. i bm. com/i nfocenter/db21 uw/v9r7/i ndex. jsp?topi 
c=/com. i bm.db2 . 1 uw.admi n . ha.doc/doc/c0006237 .html 

► DB2 inspect utility: 

https : //publ i b . boul der. i bm. com/i nfocenter/db21 uw/v9r7/i ndex . j sp?topi 
c=/com . i bm . db2 . 1 uw . admi n . cmd . doc/doc/r0008633 . html 

► DB2 database analysis and report tool (db2dart): 

https : //publ i b . boul der. i bm. com/i nfocenter/db21 uw/v9r7/i ndex . j sp?topi 
c=/com.ibm.db2.1 uw.admi n.trb.doc/doc/c0020760. html 

► DB2 recover overview: 

https : //publ i b . boul der. i bm. com/i nfocenter/db21 uw/v9r7/topi c/com. i bm. 
db2 . 1 uw . admi n . ha . doc/doc/tOOl 1800 . html 

► Recovering dropped table: 

https : //publ i b . boul der. i bm. com/i nfocenter/db21 uw/v9r7/i ndex . j sp?topi 
c=/com.ibm.db2.1 uw.admi n. ha. doc/doc/t00063 18. html 

► Managing objects with Tivoli Storage Manager (db2adutl): 

https : //publ i b . boul der. i bm. com/i nfocenter/db21 uw/v9r7/i ndex . j sp?topi 
c=/com. i bm.db2 . 1 uw.admi n . cmd . doc/doc/r0002077 . html 
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5 


Monitoring tools 


In this chapter we discuss methods for providing proactive monitoring for an IBM 
Smart Analytics System, in particular, how to integrate into an existing IBM Tivoli 
Monitoring infrastructure. 

Proactive monitoring is important for the IBM Smart Analytics System, just as it is 
for any other critical piece of IT infrastructure, because it helps to provides early 
notification of any problems, which will allow you to minimize or avoid any 
outages that will affect your end users. 
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5.1 Cluster and operating system monitoring 


An important aspect in the availability and health of the IBM Smart Analytics 
System is monitoring the operating system and hardware of the database and 
application modules in the system. This includes monitoring such items as these: 

► File system utilization 

► Hardware errors on the servers 

► Critical resource utilization problems (for example, CPU, memory) 


5.1.1 AIX and Linux 

The AIX and Linux based IBM Smart Analytics System offerings vary slightly in 
monitoring requirements based on their underlying operating systems. For AIX, 
hardware monitoring can be done both from within the OS, and also from the 
Hardware Management Console supplied as part of the IBM Smart Analytics 
System. For Linux, hardware monitoring is done by the service processor 
included in each System x server. 


5.1.2 IBM Systems Director 

The IBM Smart Analytics System comes configured with an IBM Systems 
Director environment on Linux-based offerings that provides monitoring 
capabilities for the cluster. The default configuration consists of an IBM Systems 
Director server running on the Management node, and IBM Systems Director 
agents installed on all other nodes in the cluster. 

The IBM Systems Director server communicates directly with the service 
processor on each of the servers in the IBM Smart Analytics System. This allows 
monitoring and reporting of any hardware faults that might occur on the servers. 
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Integrating IBM Systems Director with enterprise monitoring 

The IBM Systems Director server provides a number of ways to integrate the 
hardware monitoring it provides for the IBM Smart Analytics System into your 
enterprise monitoring system. You can create an Event Automation Plan for your 
servers that allows you to specify any number of actions when the events of the 
type and/or severity you specify occur. The actions you can specify include: 

► Send a Tivoli Enterprise Console® event 

► Send an SNMP trap 

► Send an email 

► Update a log file 

The most straightforward method for integrating into an IBM Tivoli Monitoring 
environment is to configure the IBM Systems Director Server to forward a Tivoli 
Event Console event directly. Other alternatives such as sending an SNMP trap 
to an IBM Tivoli Netcool® OMNIbus server, or simply writing the event to a log file 
which you have configured a Tivoli log file adapter to monitor, will also work. 

Example of configuring IBM Systems Director 

Configuration of the IBM Systems Director server is done by accessing the 
product’s web interface. By default, the web interface uses ports 8421 (for http) 
or 8422 (for https). Use https if possible to ensure that passwords are not 
transmitted over the network without encryption. Use the following URL for 
accessing the IBM Systems Director web interface: 
https ://management_node_hostname:8422/ibm/console 

When you are prompted, login with a user who has been authorized to administer 
IBM Systems Director by being a member of the smadmin group. This is only the 
root user initially, but it is best to add individual administrators to this group so 
that you do not need to login directly with the root user. 
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After you have successfully logged in, you are presented with a panel similar to 
the one in Figure 5-1. 



Figure 5- 1 Initial IBM Systems Director panel 


You can use the Navigate Resources link from the task list in the left panel, then 
the All Systems group to explore the resources which have been configured in 
your environment, and to check that all the servers in your IBM Smart Analytics 
System are present. You can optionally create custom groups for your servers 
from this panel if you want to configure unique alerting for various server types. 
Alternatively, you can just use the pre-configured dynamic ^// Systems group. 

You can now create an Event Automation Plan that allows you to forward alerts 
generated by IBM Systems Director for problems with the servers in the IBM 
Smart Analytics System to your enterprise monitoring environment. From the 
task list in the left panel, select the Automation menu, then select Event 
Automation Plans. In the main panel, click Create to launch the wizard. 
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Complete the wizard panels as follows: 

► Name and Description: 

Enter a name for the Event Action Plan, such as Forward_Alerts. 

► Targets: 

Select either one of the custom groups you created for your systems or the 
default All Systems dynamic group, and add it to the Selected Systems list on 
the right. 

► Events: 

Configure the type of events you need to forward to your enterprise 
monitoring environment. One approach is to use the Event Severity filter, and 
include all Fatal and Critical events. You can also set up CPU, memory, and 
disk utilization thresholds at which you want to be alerted. 

► Event Actions: 

Select Create to create a new event action. Select from the list the 
appropriate action to forward the event to your enterprise monitoring. This 
might be sending a Tivoli Enterprise Console event, sending an SNMP trap or 
inform request, or even running a custom program on the management node 
to forward the alert for you. 

After you have created your new event action, be sure to use the Test button 
to test that the action works correctly and that your enterprise monitoring 
receives the alert. Testing an SNMP trap event action will result in an SNMP 
trap being received on your specified SNMP server, similar to this example: 

2010-09-29 14:26:33 172 . 16 . 10 . 10 (vi a TCP: [9.26. 120.212] : -19469) TRAP, SNMP 
vl, community public 

SNMPv2-SMI : :enterpri ses.2.6. 159.201. 1 .3. 1 Enterprise Specific Trap (1) 
Uptime: 1 day, 12:45:13.43 

SNMPv2-SMI::enterprises.2.6.159.202.1 = STRING: "Director. Test. Action" 
SNMPv2-SMI : :enterpri ses.2.6. 159.202.2 = STRING: "Informational" 
SNMPv2-SMI::enterprises.2.6.159.202.5 = STRING: "An internally generated 
event for the purpose of testing the 'Forward alert to central monitoring - 
9/29/10 10:26 AM 1 action configuration." 

SNMPv2-SMI: enterprises. 2. 6. 159. 202. 6 = STRING: "Alert" 

► Time Range: 

Select the time range over which you want the Event Automation Plan to be 
effective. This is normally the default of ^4// the time. 
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► Summary: 

This panel displays a summary of the options configured previously, similar to 
Figure 5-2. 



Figure 5-2 Event Automation Plan summary 


5.2 DB2 monitoring 

The IBM Smart Analytics System consists of many DB2 database partitions and 
nodes. In this section, we introduce the monitoring utilities DB2 provided and 
DB2 Performance Expert. These monitoring tools provides the capability to 
gather information for problem determination, performance tuning, and trend 
analysis. 
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5.2.1 DB2 monitoring utilities 


The DB2 Database Server provides a comprehensive set of monitor tools to help 
database administrators manage DB2 instance and both single and multi 
partition databases. 

DB2 snapshot monitor and event monitors 

The DB2 snapshot monitor and event monitor are good for performance 
monitoring. The database administrator can use these two monitor tools to find 
out why an application receives poor response time or to track an on-going event. 

The DB2 snapshot monitor gathers information about the system activity for a 
specific time. It takes a “picture” of the usage of the database resource usages 
such as buffer pools, memory, connection activities, statements, and others. You 
can analyze the snapshots taken for a period time to understand the application 
behavior and system resource usage trends and take proactive action to 
maintain a healthy DB2 system. 

The database administrate can use the DB2 event monitors to track a event in 
the database for a period of time. The event monitor records the complete 
transaction activity and store the information into a file or a table. DB2 provides a 
set of predefined events for monitoring the server activities, for example, a 
CONNECTIONS event tracks database connections. You also can generate you 
own events. An event can be started and stopped anytime. When using an event 
monitor, limit the information collected to the level need because event monitors 
also consumes resources. 

For further references about DB2 snapshot monitor and event monitors, see the 
documentation available at the following website: 

► DB2 snapshot monitor: 

https : //publ i b . boul der . i bm. com/i nfocenter/db21 uw/v9r7/i ndex . j sp?topi 
c=/com.ibm.db2.1 uw. admin. mon.doc/doc/c0006003.html 

► DB2 event monitors: 

https : //publ i b . boul der. i bm. com/i nfocenter/db21 uw/v9r7/i ndex . j sp?topi 
c=/com.ibm.db2.1 uw. admin. mon.doc/doc/r0005993.html 

DB2 administrative views and table functions 

Analyzing information about the DB2 snapshot monitor output for a partitioned 
database environment can be challenging. DB2 administrative views and table 
functions provide simple means to gather specific or globally aggregated 
snapshot data from a specific database partition or all database partitions. DB2 
administrative views provide an easy-to-use application programming interface to 
DB2 administrative functions through SQL. 
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Example 5-1 shows a sample output of the SNAPAPPL administrative view. 
Example 5-1 Gathering snapshot information using SNAPAPPL administrative view 


C:\>db2 select AGENTJD, R0WS_READ, ROWS_WRITTEN, UOW_ELAPSED_TIME_S from sysibmadm.snapappl 
AGENT_ID ROWSREAD ROWS_WRITTEN UOW_ELAPSED_TIME_S 


131128 

75 

81 

131127 

78 

77 

65590 

83 

75 


65592 

65591 

65590 


17 record (s) selected. 


Example 5-2 shows a list of the administrative views available in DB2 version 9.7. 
The views, routines, and procedures with SNAP in their name actually call 
snapshot under the covers, whereas the MON and WLM routines and 
procedures do not. 

Example 5-2 DB2 version 9.7 administrative views 


C:\>db2 list tables for schema SYSIBMADM 

Table/View Schema Type 


ADMINTABCOMPRESSINFO 

ADMINTABINFO 

ADMINTEMPCOLUMNS 

ADMINTEMPTABLES 

APPLICATIONS 

APPL_PERFORMANCE 

AUTHORIZATIONIDS 

BP_HITRATI0 

BP_READ_I0 

BP_WRITE_I0 

C0NTACTGR0UPS 

CONTACTS 

C0NTAINER_UTILIZATI0N 

DBCFG 

DBMCFG 

DBPATHS 

DB_HIST0RY 

ENV_FEATURE_INFO 

ENV_INST_INF0 


SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 
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ENV_PROD_INFO 

ENV_SYS_INFO 

ENV_SYS_RESOURCES 

LOCKS_HELD 

LOCKWAITS 

LOGJJTILIZATION 

LONG_RUNNING_SQL 

NOTIFICATIONLIST 

OBJECTOWNERS 

PDL0GMSGS_LAST24H0URS 

PRIVILEGES 

QUERY_PREP_COST 

REG_VARIABLES 

SNAPAGENT 

SNAPAGENT_MEMORY_POOL 

SNAPAPPL 

SNAPAPPL_INFO 

SNAPBP 

SNAPBP_PART 

SNAPCONTAINER 

SNAPDB 

SNAPDBM 

SNAPDBM_MEMORY_POOL 

SNAPDB_MEMORY_POOL 

SNAPDETAILLOG 

SNAPDYN_SQL 

SNAPFCM 

SNAPFCM_PART 

SNAPHADR 

SNAPLOCK 

SNAPLOCKWAIT 

SNAPSTMT 

SNAPSTORAGE_PATHS 

SNAPSUBSECTION 

SNAPSWITCHES 

SNAPTAB 

SNAPTAB_REORG 

SNAPTBSP 

SNAPTBSP_PART 

SNAPTBSP_QUIESCER 

SNAPTBSP_RANGE 

SNAPUTIL 

SNAPUTIL_PROGRESS 

TBSPJJTILIZATION 


SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 
SYSIBMADM V 


DB2 relational monitoring interfaces, introduced in DB2 9.7, is an enhanced 
reporting and monitoring tool that can capture information about database 
system, data objects, and package cache. DBA can access the interfaces by 
SQL to quickly identify issues during performance monitoring and problem 
determination situations. The DB2 relational monitoring interfaces are 
light-weight, efficient, and have low impact on the system. 
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The functions available to gather information about system activity, data object 
level monitoring are as follows: 

► System level: 

- MON_GET_CONNECTION 

- MON_GET_CONNECTION_DETAILS 

- MON_GET_SERVICE_SUBCLASS 

- M O N_G ET_S E RV I C E_S U BC LASS_D ETA I LS 

- M O N_G ET_U N IT_0 F_WO R K 

- M O N_G ET_U N IT_0 F_WO RK_D ETA I LS 

- MON_GET_WORKLOAD 

- MON_GET_WORKLOAD_DETAILS 

► Activity level: 

- MON_GET_ACTIVITY_DETAILS 

- MON_GET_PKG_CACHE_STMT 

- M O N_G ET_P KG_C AC H E_STMT_D ETA I LS 1 

► Data object level: 

- MON_GET_BUFFERPOOL 

- MON_GET_CONTAINER 

- M O N_G ET_EXTE NT_M OV E M E NT_STATU S 

- MON_GET_INDEX 

- MON_GET_TABLE 

- MON_GET_TABLESPACE 

In 6.3, “DB2 Performance troubleshooting” on page 152, we show examples of 
using the relational monitoring interfaces to gather database performance 
metrics. 

The following websites provide further information about DB2 administrative 
views, table functions, and DB2 relational monitoring interfaces: 

► DB2 administrative views and table functions: 

https : //publ i b . boul der . i bm. com/i nfocenter/db21 uw/v9r7/i ndex . j sp?topi 
c=/com.ibm.db2.1 uw. admin. mon.doc/doc/t0010418.html 

https : //publ i b. boul der. i bm. com/i nfocenter/db21 uw/v9r7/i ndex. jsp?topi 
c=/com.ibm.db2.1 uw.sql .rtn.doc/doc/c0022652.html 

► DB2 relational monitoring interfaces: 

https : //publ i b . boul der. i bm. com/i nfocenter/db21 uw/v9r7/i ndex . j sp?topi 
c=/com.ibm.db2.1 uw.wn.doc/doc/c0055021.html 


Available with DB2 Version 9.7 FixPack 1 and after 
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DB2 monitor utility: db2top 

The db2top utility is a monitoring tool distributed with DB2 for Linux and UNIX 
environments. db2top collects DB2 snapshot monitor information cumulatively 
and gives real time delta values of snapshot metrics. db2top is a user friendly tool 
with interactive text-based GUI interface that provides DBA a better 
understanding about the metrics. The utility consolidates multiple snapshot 
options and categorizes the information to make the outputs easy to interpret. 

You can access db2top in either the interactive mode or batch mode. Using the 
interactive mode, users can browse between the snapshots options. When 
running db2top in batch mode, you can store the performance information output 
(in CSV format, for example) and used later for further analysis. 

db2top monitors the following snapshot subjects: 

► Database 

► Table space 

► Dynamic SQL 

► Session 

► Buffer pool 

► Lock 

► Table 

► Bottlenecks 

► Utilities 

► Skew monitor (for database partitioned environments) 

In Example 5-3 we start db2top in interactive mode to monitor database edwp. 
Example 5-3 Starting db2top in interactive mode 
db2top -d edwp 


In 6.3, “DB2 Performance troubleshooting” on page 152 we show how to monitor 
performance on IBM Smart Analytics System databases using db2top. 

Documentation about the db2top utility is available at the following websites: 

► DB2 utility tool, db2top: 

https : //publ i b . boul der . i bm. com/i nfocenter/db21 uw/v9r7/i ndex . j sp?topi 
c=/com . i bm . db2 . 1 uw . admi n . cmd . doc/doc/r0025222 . html 

► IBM Redbooks publication, Up and Running with DB2 on Linux, SG24-6899: 
http : //www . redbooks . i bm . com/abstracts/SG246899 . html 

► IBM developerWorks: DB2 problem determination using db2top utility: 

http : //www. i bm.com/devel operworks/data/1 i brary/techarti cl e/dm-0812wa 
ng/ 
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DB2 problem determination command: db2pd 

The db2pd command is a problem determination tool that collects information 
without acquiring any latches or using any DB2 engine resources. db2pd reads 
information directly from the memory sets. The db2pd command supports 
partitioned databases and can be used for IBM Smart Analytics System. 

Documentation about the db2top utility is available at the following websites: 

► DB2 Monitoring and troubleshooting using the db2pd command: 

https : //publ i b . boul der . i bm. com/i nfocenter/db21 uw/v9r7/i ndex . j sp?topi 
c=/com.ibm.db2.1 uw. admin. trb.doc/doc/c0054595.html 

► DB2 db2pd command reference: 

https : //publ i b . boul der. i bm. com/i nfocenter/db21 uw/v9r7/i ndex . j sp?topi 
c=/com. i bm.db2 . 1 uw. admi n . cmd . doc/doc/rOOl 1729 . html 

► IBM Redbooks publication, Up and Running with DB2 on Linux, sg24-6899: 
http : //www . redbooks . i bm . com/abstracts/SG246899 . html ?0pen 

DB2 memory tracker command: db2mtrk 

The db2mtrk command reports the memory usage and memory pool allocation 
for DB2 instance and databases. db2mtrk is partition-based command, and you 
can invoke db2mtrk from any database partition defined on the db2nodes.cfg file. 
When the instance level information is returned, the command returns 
information about the attached database partition only. 

Example 5-4 1 shows sample db2mtrk output about the DB2 instance and 
single-partition database memory information. 

Example 5-4 db2mtrk output for DB2 instance and single partition database 

C:\Documents and Settings\Admini strator>db2mtrk -i -d 
Tracking Memory on: 2010/10/18 at 14:22:36 

Memory for instance 

other monh fcmbp 

37, 9M 320,0k 52.8M 

Memory for database: SAMPLE 


utilh 

pckcacheh 

other 

catcacheh 

bph (1) 

bph (S32K) 

64, OK 

192,0k 

128,0k 

64, OK 

2,3M 

832,0k 

bph (S16K) 

bph (S8K) 

bph (S4K) 

shsorth 

lockh 

dbh 

576,0k 

448,0k 

384, OK 

0 

16, 6M 

22, 1M 
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apph (57) apph (56) apph (55) apph (54) apph (53) apph (51) 

64, OK 64, OK 64, OK 64, OK 64, OK 64, OK 

appshrh 
256, OK 


For further information about db2mtrk, see the documentation available at the 
following website: 

http : //publ i b. boul der . i bm. com/i nfocenter/db21 uw/v9r7/topi c/com. i bm.db2 . 
1 uw . admi n . cmd . doc/doc/r00087 12 . html 

5.2.2 DB2 Performance Expert for Linux, UNIX, and Windows 

DB2 Performance Expert is part of the IBM Information Management integrated 
tooling for DB2 on Linux, UNIX, and Windows platforms, and is delivered with 
IBM Smart Analytics System versions 5600, 7600, and 7700. DB2 Performance 
Expert tracks database and operating system activities and stores information on 
its own database to be used later for analysis. You can use DB2 Performance 
Expert to monitor DB2 on the IBM Smart Analytics System. 

DB2 Performance Expert provides four levels of performance monitoring: 

► Online monitoring 

► Short-term history monitoring 

► Long-term history monitoring 

► Exception processing 

IBM Optim Performance Manager, a successor of DB2 Performance Expert, 
significantly extends the database monitoring capabilities provided in DB2 
Performance Expert. It introduced a new web-based interface, which significantly 
simplifies the deployment of the product. The legacy Performance Expert client 
component is still available with this version, to allow for smoother migration of 
existing DB2 Performance Expert users. At the time of writing, Optim 
Performance Manager is not shipped with IBM Smart Analytics System yet. For 
detail description about IBM Optim Performance Manager, see the IBM 
Redbooks publication, IBM Optim Performance Manager for DB2 for Linux, 
UNIX, and Windows, SG24-7925. Contact IBM support for migration details. 

Online monitoring 

Online monitoring is used to monitor the current operation of your DB2 system 
and the underlying operating system at a point in time when the DB2 instance is 
being monitored by the DBA sitting in front of the Performance Expert. This can 
help you gather current DB2 system information, current applications, and SQL 
workload, and because certain DB2 performance problems are caused by 
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bottlenecks in the underlying operating system, this information is gathered as 
well. PE online monitoring features and functions can help you detect problems 
such as long waits and timeouts, deadlocks, and long running SQL statements. 


These features and functions provide the DBA with the ability to drill down to get 
more detailed information, such as set filters to isolate the problem, customize 
graphical health charts to visualize how activity and performance evolves over 
time, trace SQL activities for a single application or the whole database, and view 
and analyze the trace to identify, for example, heavy hitter SQL statements that 
need further tuning. 

Short-term history monitoring 

Short-term history data can provide information to help a DBA look at specific 
events that occurred in a short interval of time. PE allows the user to configure 
the number of hours PE stores short-term history information. Using PE 
short-term history monitoring mode can help a DBA diagnose deadlocks, long 
running SQL, time-outs, and lock escalations that happened minutes, hours, or 
days ago without the need to reproduce the problems, and monitor other 
aspects, such as UOW or buffer pool, table space, and file system usage. 

Also, for short-term history data, the graphical health charts can be used to 
visualize performance metrics over time in history either to diagnose problems or 
identify trends. For online and short-term monitoring, PE provides the users the 
ability to see detailed information for the following items: 

► Application Summary/Details: 

- Times 

- Locking 

- SQL activity 

- SQL statements 

- Buffer pools 

- Caching 

- Sorting 

- Memory pools 

- Agents 

► Statistic Details: 

- Instance information 

- Database (usage, caches, high water marks, locks, reads, and writes) 

- Table spaces (Space management, read/write and I/O, and containers) 

- Tables 

- Buffer pool (read, write, I/O, and so on) 

- Memory pools 

- Dynamic SQL statement cache details 

- Utility Information 
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► Applications in Lock Conflicts/Locking Conflicts 

► Locking Conflicts 

► System Health: View DB2 performance information in a graphical format 

► System Parameters - Instance 

► System Parameters - Database 

► Operating System Information: 

- Memory and process configuration, processor status 

- File systems 

► Operating System Status: 

- Memory and CPU usage 

- Running processes 

- Disk utilization 

Long-term history monitoring 

Long-term history data is collected over a period of time. The collected data is 
used for trend analysis. PE can help you collect trend analysis data that can be 
used to develop a performance baseline for your system. Using trend analysis 
data can also help you understand how your system will perform as follows: 

► React during normal and peak periods to help you set realistic performance 
goals. 

► Resolve potential performance problems before they become an issue. 

► Grow over a period of time. 

DB2 PE provides long-term monitoring capability under the following functions: 

► Performance Warehouse and Rules of Thumb: 

PE includes Performance Warehouse, which allows you to quickly and easily 
identify potential performance problems. Performance Warehouse collects 
performance data for SQL, database, buffer pool activity, and the operating 
system. This performance data is used for generating reports. These reports 
can be used for further investigation and trend analysis. Performance 
Warehouse data can also be used for Rules of Thumb (RoT), which is 
included in Performance Warehouse. 

RoT can help a DBA by being proactive in making suggestions on how to 
improve performance. Performance Warehouse provides RoT queries for 
SQL, database, table space, and buffer pool activity. 

► Buffer Pool Analysis: 

Buffer pools are one of the most important aspects for tuning. PE Buffer Pool 
Analysis gathers detailed information regarding current buffer pool activity 
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using snapshots. Buffer Pool Analysis allows the database administrator to 
view buffer pool information in a variety of formats, including tables, pie 
charts, and diagrams. Providing these particular formats to view buffer pool 
information will enable the database administrator to quick identify potential 
problems and do trend analysis. 

Exception processing 

Exception process monitoring is another PE feature that allows DBA to monitor a 
database server proactively. DBAs can use the exception processing function to 
activate predefined alert sets for OLTP or Bl workloads or to configure their own 
alerts both to notify them when a particular situation has occurred. PE provides 
two types of alerts: deadlock and periodic. The alert message can be sent to 
specified email addresses or a user exit can be called that allows you to 
exchange the alert message and details with other applications or to execute 
actions. The user exits can be used, for example, to send SNMP traps to IBM 
Tivoli Enterprise Console when a threshold is reached. That way the PE can 
integrate IBM Smart Analytics System with the existing enterprise monitoring 
environment. 

Additionally, signals on the PE client indicate the occurrence of an exception 
together with drill down options. 

DB2 PE high level architecture overview 

The DB2 Performance Expert version 3.2 has two main components: PE Server 
and PE Client. 

► PE Server: 

PE Server collects and stores the performance data of the monitored 
DB2instance. On the IBM Smart Analytics System environment, PE server is 
installed on the management node, and it monitors the production instance 
remotely. The DB2 Performance Expert stores its metadata information and 
the monitored DB2 instance information collected in the PE database DB2PE. 
This database is hosted at the management node under bcupe instance. 

The PE Server uses DB2 snapshot and event monitors to collect DB2 
performance data for the online monitoring, short-term history, long-term 
history, and exception processing. To reduce overhead on the monitored DB2 
instance, PE Server uses DB2 snapshots instead of event monitoring 
whenever possible. 

► PE Client: 

PE Client is the user interface of DB2 Performance Expert. It allows you to 
view the performance data collected by PE Server. PE Client does not 
communicate with the monitored instance, it always gather information from 
the PE Server. You can use the PE Client to configure the PE Server. 
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The PE Client must be installed on a workstation apart from the IBM Smart 
Analytics System servers. 

For further reference about how to install and configure the PE Client, see the 
manual IBM DB2 Performance Expert for Linux, UNIX, and Windows 
Installation and Configuration, GC1 9-2503-02. 

Figure 5-3 illustrates a DB2 Performance Expert architecture on the IBM Smart 
Analytics System. 



Figure 5-3 DB2 Performance Expert architecture on IBM Smart Analytics System 

To start, stop, and check the status of the PE Server, use the following 
procedures from the management node: 

► To start the DB2 Performance Expert server: 

a. Log in to the management node as the DB2 Performance Expert user 

bcupe. 

b. Start the Performance Expert instance: 
db2start 

c. Issue the following command to start DB2 Performance Expert: 
pestart 

► To stop the DB2 Performance Expert server: 

a. Log in to the management node as the DB2 Performance Expert user 

bcupe. 

b. Issue the following command: 
pestop 

c. Stop the DB2 Performance Expert instance: 
db2stop 
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► To determine the status of the DB2 Performance Expert server: 

Log in to the management node as the DB2 Performance Expert user bcupe 

and issue the following command: 

pestatus 

For further documentation and references about DB2 Performance Expert, check 
this website: 

http://www-01.ibm.cotn/software/data/db2imstool s/db2tools-l ibrary.html#expert 


Naming: The latest version of DB2 Performance Expert is now named IBM 
Optim Performance Manager. The current versions of the IBM Smart Analytics 
System are delivered with IBM DB2 Performance Expert V3.2. There are no 
restrictions to upgrade DB2 PE V3.2 on the IBM Smart Analytics System to 
IBM Optim Performance Manager. 

For further references about IBM Optim Performance Manager, see the 
website: 

http://www-01.ibm.com/software/data/optim/performance-manager-extend 

ed-edition/ 


5.3 Storage monitoring 

A number of storage devices which need to be considered for monitoring are 
present in the IBM Smart Analytics System. These include the internal disks on 
each node used for the OS and the SAN switches used to connect the nodes to 
the storage subsystems such as DS3400, DS3500, and DS5300. 

5.3.1 IBM Remote Support Manager 

The most effective method of monitoring for the storage subsystems is to use the 
IBM Remote Support Manager (RSM) for Storage product. This runs on a 
dedicated server, and monitors all storage subsystems in the cluster. Any 
problems are automatically logged with IBM’s call management system, with 
details optionally being emailed to a user configurable address. 

The configuration of the Remote Storage Manager can be examined by logging 
on to the server using its web interface. You need to login with one of the 
supplied user IDs such as admin. Passwords for these IDs can be reset as root 
from the command line on the RSM server itself using the rsm-passwd command. 
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Figure 5-4 shows the RSM initial panel after being logged in. 
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Figure 5-4 Initial Remote Support Manager panel 


Although the Remote Support Manager must already be configured to report 
problems to IBM Support, it is best to also configure it to send notification to your 
system administrators to ensure that you are aware of any problems as soon as 
they occur. 

You can configured this by using, from the main page, System Configuration ->• 
Contact Information to configure your primary contact for the system. You must 
also configure the “SMTP Server” field in the “Connection Information” page with 
your local mail relay server, to ensure that the Remote Support Manager is able 
to notifications through mail to the address you have specified. 
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5.3.2 Internal disks 


Also, you need to monitor internal SAS and SATA disks. 

Linux 

The internal disks on Linux-based IBM Smart Analytics System environments 
are monitored by IBM Systems Director, and hardware problems with the disks 
are logged with IBM Systems Director 

AIX 

The internal disks on AIX based IBM Smart Analytics System are standard 
Power Systems disks, and are monitored by the operating system. Any errors 
with the disks must be logged in the system error log, and can be picked up from 
there by a standard Tivoli AIX log adapter. 

5.3.3 SAN switches 

The IBM Smart Analytics System contains a number of SAN switches which are 
used to connect the individual servers to the storage subsystems. Preferably, 
monitor these switches. 

Although the switches support sending SNMP traps natively, by default they are 
only connected to the internal cluster network, therefore, it is not possible to send 
SNMP traps directly to your enterprise monitoring environment. By default, the 
SAN switches must be configured to send SNMP traps to the IP address of the 
management node, where they must be captured by IBM Systems Director. 

You can confirm this by logging on to the web administration interface for the 
switch, selecting Switch Admin from the menu at the left, selecting Show 
Advanced Mode at the top right, and selecting the SNMP tab. The default 
configuration is similar to that shown in Figure 5-5. 
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SwitchName: sanswl gl DomainID: 1(0x1) VWVM 10:00:00:0&1e:90:30:cf Thu Sep 30 201 01 5: 00: 02 GMT-f 00: 00 


Configure Routing Extended Fabric AAA Service Trace FICON CUP Security Policies 




[Switch Administration opened): Thu Sep 30 2010 14:49:24 GMTt00:00 


Configure SNMP par ... Mode: Advanced Free Professional Management Tool localhost ADO User: admin Role: admin |0 

Figure 5-5 Default SNMP configuration for SAN switch 


IBM Systems Director can be configured to forward SNMP traps, which will allow 
the SNMP traps generated by the SAN switches to be picked up by your 
enterprise alerting infrastructure. 

SNMP traps can be forwarded in two ways: 

► Through an Event Action Plan 

► By configuring the SNMPServer.properties file 

To configure using an Event Action Plan, go to the Event Action Plan Builder, and 
select one of the following events, then right-click and select Customize: 

► Send an SNMP Trap to a NetView® Host 

► Send an SNMP Trap to an IP Host 
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5.4 Network monitoring 

The IBM Smart Analytics System comes with a number of Ethernet switches that 
are used to provide the interconnects between the various modules which make 
up the system. The switches support SNMP, but similar to the SAN switches, 
their administrative IP interfaces are configured on the internal networks by 
default. In the same way, this can be worked around by configuring the Ethernet 
switches to send SNMP traps to the management node, where they can be 
forwarded to your enterprise monitoring environment. 

The Ethernet switches can be configured either using a web interface or using a 
telnet session. To configure using telnet, perform these steps: 

1 . Telnet to the switch IP address from one of the IBM Smart Analytics System 
nodes, and log on (the default user and password for the switches are “admin” 
and “admin,” though this must be changed). 

2. Enter configuration mode using this command: 
configure 

3. You can now use the snmp-server command to configure SNMP traps. 

To get a full list of sub commands, type: 

snmp-server ? 

For example, to configure SNMP traps to be sent to the management node at 
IP address 192.168.11 1.8 using community string public, use the following 
commands: 

snmp-server host 192.168.111.20 public 
snmp-server enable traps 

As for SAN switch monitoring using IBM Systems Director, SNMP traps can be 
forwarded in two ways: 

► Through an Event Action Plan 

► By configuring the SNMPServer.properties file 

To configure using an Event Action Plan, go to the Event Action Plan Builder, and 
select one of the following events, then right-click and select Customize: 

► Send an SNMP Trap to a NetView Host 

► Send an SNMP Trap to an IP Host 
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6 


Performance 

troubleshooting 


In this chapter we introduce performance monitoring on a system with nodes 
running in parallel, monitoring the system both globally and at the individual node 
level. We discuss troubleshooting when a performance issue is suspected by 
using operating system level and database level tools and metrics. 


© Copyright IBM Corp. 201 1 . All rights reserved. 
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6.1 Global versus local server performance 
troubleshooting 

The IBM Smart Analytics System offering is a DB2 database system, running a 
shared-nothing set of database nodes in parallel. Any SQL workload is spread 
across all nodes simultaneously using the “divide and conquer” approach. 

The resulting work is accomplished in parallel with the results from all of the 
individual nodes compiled together at the end on the administration node and 
shipped back to the SQL client. 

All SQL workload submitted to the IBM Smart Analytics System is dependent 
both on the efficient orchestration of the work across all the nodes, as well as on 
effective execution at the individual node level. 

Hence, it is critical to observe performance issues from both a “global” view of 
the entire set of nodes (servers), as well as from the more traditional view of the 
individual server’s performance and resource use. Start with the “global” view of 
system performance and resource use, and drill down, if necessary to the 
individual node level. 

Another important fact to remember when working on performance 
troubleshooting is that problems and symptoms can be “layered”. You might often 
notice a performance problem from the “outer layer” of the operating system 
(OS), and that after identifying which resource is a problem (such as CPU, I/O, 
and paging), you can drill down a layer for more specifics at the relational 
database management system (RDBMS) layer (and sometimes further to the 
application layer). 

So a natural progression in the performance troubleshooting of the IBM Smart 
Analytics System is to incorporate both notions: start viewing global OS 
resources (looking for type of resource overuse or shortage and comparing 
“global” verse local use of OS resources); then drill down to the DB2 database 
layer and try to isolate a specific cause (such as configuration, bad SQL, and too 
much regular workload). 

It is also important to note that, although we are highlighting the notion of a 
“global” perspective of performance and resource monitoring, all the traditional 
methods and tools (such as vmstat, iostat, top, topaz, sar, uptime, db2top, 
db2pd, db2mtrk, DB2 snapshot monitor, and DB2 event monitor) to view 
performance and monitor resource at the individual server level work. This 
operates just as on other stand-alone servers and still applies to the IBM Smart 
Analytics System. 
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6.1 .1 Running performance troubleshooting commands 


There are various ways that you can run performance troubleshooting 
commands on the IBM Smart Analytics System: 

► Running the stand-alone commands directly on specific nodes 

► Running commands in parallel using various utilities built-in with the IBM 
Smart Analytics System 

► Custom methods in more complex situations 

Running commands directly on each physical node 

The simplest method is to run any of the standard performance commands 
directly on each node of interest. To obtain a complete picture of the whole 
system, you need to run the command separately, directly on each node one at a 
time. This method can be impractical, especially if you have many nodes. 

Example 6-1 shows this direct “one-node-at-a-time” method of executing the 
uptime command on three nodes. It is easy to see that this method might be too 
cumbersome on any greater number of nodes. 

Example 6- 1 Running uptime separately on multiple nodes 
ISAS56MGMT # ssh ISAS56R1D1 

Last login: Wed Oct 6 12:11:26 2010 from isas56mgmt 
ISAS56R1D1 :~ # uptime 

12:13pm up 50 days 2:57, 2 users, load average: 0.96, 0.47, 0.17 

ISAS56R1D1 # exit 
logout 

Connection to ISAS56R1D1 closed. 

ISAS56MGMT It ssh ISAS56R1D2 

Last login: Wed Oct 6 12:12:48 2010 from isas56mgmt 
ISAS56R1D2 :~ It uptime 

12:13pm up 50 days 3:06, 1 user, load average: 0.97, 0.51, 0.20 

ISAS56R1D2:™ It exit 
logout 

Connection to ISAS56R1D2 closed. 

ISAS56MGMT :~ It ssh ISAS56R1D3 

Last login: Wed Oct 6 05:16:38 2010 from isas56mgmt 
ISAS56R1D3 # uptime 

12:13pm up 50 days 3:06, 1 user, load average: 0.90, 0.57, 0.24 

ISAS56R1D3 :~ # exit 
logout 

Connection to ISAS56R1D3 closed. 


Running commands across multiple physical nodes in parallel 
using dsh 

In addition to using the commands directly on each node in a serial fashion, you 
can also use the dsh utility on the management node to run one or more 
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commands across all (or a chosen subset) of the physical nodes. You must be 
the root system administration user, and dsh is available on AIX based IBM 
Smart Analytics system environments 7600 and 7700. The IBM Smart Analytics 
System 5600 does not include dsh by default. However, you can choose to 
download and install it. 

Example 6-2 on page 130 shows that dsh executes commands once per physical 
node for all physical nodes. Figure 6-2 also shows that the command executes in 
parallel (not serially) as the date command returns timestamps that are identical, 
unlike the case if they had been launched serially. 

Figure 6-2 also shows that the output returned by dsh is not in the order 
launched, but rather in the order that the commands completed on the nodes 
(output for node ISAS56R1 D5 comes before node ISAS56R1 D1 , and output for 
nodes ISAS56R1D3 and ISAS56R1D4 come before node ISAS56R1D2). 

Example 6-2 dsh launches command in parallel across all nodes chosen 

ISAS56MGMT:- # dsh -a "sleep 5;echo 'date' 1 : IP addr ==> ''hostname -i'" 

ISAS56R1D5: Sat Oct 9 18:21:28 EST 2010 : IP addr ==> 172.16.10.10 

ISAS56R1D1: Sat Oct 9 18:21:28 EST 2010 : IP addr ==> 172.16.10.10 

ISAS56R1D3: Sat Oct 9 18:21:28 EST 2010 : IP addr ==> 172.16.10.10 

ISAS56R1D4: Sat Oct 9 18:21:28 EST 2010 : IP addr ==> 172.16.10.10 

ISAS56R1D2: Sat Oct 9 18:21:28 EST 2010 : IP addr ==> 172.16.10.10 


Example 6-3 shows how to use dsh to run the performance command uptime 
across all physical database nodes and display the output in node name order 
using the UNIX sort command. 


Example 6-3 Using dsh to run the ‘uptime’ command across all nodes. 


ISAS56MGMT # dsh -a uptime | sort 
ISAS56R1D1: 11:24pm up 52 days 14:09, 3 users, 

ISAS56R1D2: 11:24pm up 52 days 14:17, 0 users, 

ISAS56R1D3: 11:24pm up 52 days 14:17, 1 user, 

ISAS56R1D4: 11:24pm up 52 days 14:17, 0 users, 

ISAS56R1D5: 11:24pm up 52 days 14:17, 0 users. 


load average: 0.01, 0.02, 0.00 
load average: 0.00, 0.00, 0.00 
load average: 0.00, 0.00, 0.82 
load average: 0.00, 0.00, 0.00 
load average: 0.11, 0.03, 0.01 


Running commands across multiple physical nodes serially 
using rah 

The rah utility executes your command across all physical database nodes 
serially one at a time, and returns the results in order. The rah utility is perfect for 
obtaining information that is “physical-node” oriented. You can run this utility from 
the administration node as the DB2 instance owner user ID. 

Example 6-4 demonstrates how to use the rah utility to check the db2sysc UNIX 
processes running on all physical nodes, because UNIX processes are created 
and managed at the physical UNIX node level, rah is the proper tool to use to 
check physical-level UNIX process information across all physical UNIX nodes. 
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Example 6-4 Using rah to check UNIX process db2sysc 

bcul inux@ISAS56RlDl:~> rah 1 ps aux | grep db2sysc | grep -v grep' 


bculinux 5999 0.7 0.9 7098148 639640 ? SI 23:42 0:00 db2sysc 0 

ISAS56R1D1: ps aux | grep db2sysc ... completed ok 


bculinux 31800 
bculinux 31813 
bculinux 31836 
bculinux 31846 
ISAS56R1D2: ps 


1.0 9193208 683176 ? SI 23:42 
0.8 9193212 581936 ? SI 23:42 
0.8 9193208 581828 ? SI 23:42 
0.8 9193212 581908 ? SI 23:42 
| grep db2sysc ... completed ok 


0:00 db2sysc 1 
0:00 db2sysc 2 
0:00 db2sysc 3 
0:00 db2sysc 4 


bculinux 28802 
bculinux 28812 
bculinux 28822 
bculinux 28845 
ISAS56R1D3: ps 


1.0 9193208 683180 ? SI 23:42 
0.8 9193212 581932 ? SI 23:42 
0.8 9193212 581836 ? SI 23:42 
0.8 9193208 581904 ? SI 23:42 
| grep db2sysc ... completed ok 


0:00 db2sysc 5 
0:00 db2sysc 6 
0:00 db2sysc 7 
0:00 db2sysc 8 


Because rah is a physical node oriented utility, running commands across all 
physical database nodes as opposed to across all logical database partitions, 
using this utility with commands meant to be used on a logical database partition 
level will yield incomplete results. The command rah only executes on the first 
logical database partition of a given physical node. 

Example 6-5 shows that rah is not intended for running the logical database 
partition level commands such as db2 list active databases. The IBM Smart 
Analytics System 5600 has four database partitions (logical database partitions) 
per physical node. When checking on all active partitions, the expected output is 
four partitions on each regular database partition plus one active one for the 
administration node. However, the output shows only one active database per 
physical node. Node ISAS56R1 D2 shows only one out of the expected four 
active database partitions. 

Example 6-5 Improper use of rah: checking logical node database information 
bcul inuxGISASSeRlDl:^ rah ' db2 list active databases' 
list active databases 


Active Databases 


Database name = BCUKIT 

Applications connected currently S 0 

Database path = /db2fs/bculinux/N0DE0000/SQL00001/ 


ISAS56R1D1: db2 list active databases completed ok 


list active databases 


Active Databases 


Database name = BCUKIT 

Applications connected currently = 0 

Database path = /db2fs/bculinux/N0DE0001/SQL00001/ 
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ISAS56R1D2: 


:ive databases completed 


Active Databases 


Database name = BCUKIT 

Applications connected currently = 0 

Database path = /db2fs/bculinux/N0DE0005/SQL00001/ 


ISAS56R1D3: db2 list active databases completed ok 


Running commands across multiple logical database 
partitions serially using db2_all 

The db2_al 1 utility executes your command across all logical database partitions 
serially one at a time, and returns the results in order. The db2_al 1 utility is 
perfect for obtaining information that is “logical-node” oriented. You can run this 
utility from the administration node as the DB2 instance owner user ID. 

Example 6-6 demonstrates using db2_all to run the logical database partition 
level commands db2 list active databases. The output shows the expected 
four database partitions per physical node. 

Example 6-6 Proper use of db2_all: checking logical node db information 

bcul inux@ISAS56RlDl:'“> db2_all 1 db2 list active databases 1 
list active databases 


Active Databases 


Database name = BCUKIT 

Applications connected currently = 0 

Database path = /db2fs/bculinux/N0DE0000/SQL00001/ 


ISAS56R1D1: db2 list active databases completed ok 


Active Databases 


Database name = BCUKIT 

Applications connected currently = 0 

Database path = /db2fs/bculinux/N0DE0001/SQL00001/ 


ISAS56R1D2: db2 list active databases completed ok 
list active databases 


Active Databases 


Database name = BCUKIT 

Applications connected currently - 0 

Database path = /db2fs/bculinux/N0DE0002/SQL00001/ 
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ISAS56R1D2: db2 list a 

ctive databases completed ok 

list active databases 

Active Databases 

Database name 
Applications connected 
Database path 

= BCUKIT 

currently = 0 

= /db2f s/bcul i nux/N0DE0003/SQL00001/ 

ISAS56R1D2: db2 list a 

ctive databases completed ok 

list active databases 


Database name 
Applications connected 
Database path 

= BCUKIT 

currently = 0 

- /db2f s/bcul i nux/N0DE0004/SQL00001/ 

ISAS56R1D2: db2 list a 

ctive databases completed ok 

list active databases 

Active Databases 

Database name 
Applications connected 
Database path 

= BCUKIT 

currently = 0 

= /db2f s/bcul i nux/N0DE0005/SQL00001/ 

ISAS56R1D3: db2 list a 

ctive databases completed ok 

list active databases 

Active Databases 

Database name 
Applications connected 
Database path 

= BCUKIT 

currently = 0 

= /db2f s/bcul i nux/N0DE0006/SQL00001/ 

ISAS56R1D3: db2 list a 

ctive databases completed ok 

list active databases 

Active Databases 

Database name 
Applications connected 
Database path 

= BCUKIT 

currently = 0 

= /db2f s/bcul i nux/N0DE0007/SQL00001/ 

ISAS56R1D3: db2 list a 

ctive databases completed ok 

Because db2_all runs on all logical nodes and there can be multiple logical 
database partitions per physical node, when you use db2_al 1 to check any 
physical node level information, the result can be misleading. 
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Example 6-7 shows the output when using db2_al 1 to check the db2sysc UNIX 
processes on all the nodes. The output has duplicate process information, for 
example, the PID 27947 information is repeated four times. This is because 
db2_al 1 ran the command four times on the same physical node, once for each 
of the four logical database partitions. 

Example 6-7 Improper use of db2_a l Z : checking UNIX processes db2sysc 


bcul inux@ISAS56RlDl:'“> db2_all 1 p; 

bail inux 9553 9550 0 0ct05 ? 
ISAS56R1D1: ps -ef | grep db2sysc 

bcul inux 27947 27942 78 0ct05 ? 

bcul inux 27961 27955 76 0ct05 ? 
bcul inux 28391 28282 75 0ct05 ? 
bcul inux 28401 28399 74 0ct05 ? 
ISAS56R1D2: ps -ef | grep db2sysc 

bcul inux 27947 27942 78 0ct05 ? 

bcul inux 27961 27955 76 0ct05 ? 
bcul inux 28391 28282 75 0ct05 ? 
bcul inux 28401 28399 74 0ct05 ? 
ISAS56R1D2: ps -ef | grep db2sysc 

bcul inux 27947 27942 78 0ct05 ? 

bcul inux 27961 27955 76 0ct05 ? 
bcul inux 28391 28282 75 0ct05 ? 
bcul inux 28401 28399 74 0ct05 ? 
ISAS56R1D2: ps -ef | grep db2sysc 

bcul inux 27947 27942 78 0ct05 ? 

bcul inux 27961 27955 76 0ct05 ? 
bcul inux 28391 28282 75 0ct05 ? 
bcul inux 28401 28399 74 0ct05 ? 
ISAS56R1D2: ps -ef | grep db2sysc 

bcul inux 1378 1374 70 0ct05 ? 

bcul inux 1383 1376 73 0ct05 ? 

bcul inux 1394 1392 73 0ct05 ? 

bcul inux 1417 1415 72 0ct05 ? 

ISAS56R1D3: ps -ef | grep db2sysc 

bcul inux 1378 1374 70 0ct05 ? 

bcul inux 1383 1376 73 0ct05 ? 

bcul inux 1394 1392 73 0ct05 ? 

bcul inux 1417 1415 72 0ct05 ? 

ISAS56R1D3: ps -ef | grep db2sysc 

bcul inux 1378 1374 70 0ct05 ? 

bcul inux 1383 1376 73 0ct05 ? 

bcul inux 1394 1392 73 0ct05 ? 

bcul inux 1417 1415 72 0ct05 ? 

ISAS56R1D3: ps -ef | grep db2sysc 

bcul inux 1378 1374 70 0ct05 ? 

bcul inux 1383 1376 73 0ct05 ? 

bcul inux 1394 1392 73 0ct05 ? 

bcul inux 1417 1415 72 0ct05 ? 

ISAS56R1D3: ps -ef | grep db2sysc 


; -ef | grep db2sysc | grep -v grep 1 

00:00:30 db2sysc 0 
... completed ok 

12:11:37 db2sysc 1 

11:56:54 db2sysc 2 
11:41:54 db2sysc 3 
11:33:56 db2sysc 4 
... completed ok 

12:11:37 db2sysc 1 

11:56:54 db2sysc 2 
11:41:54 db2sysc 3 
11:33:56 db2sysc 4 
... completed ok 

12:11:37 db2sysc 1 

11:56:54 db2sysc 2 
11:41:54 db2sysc 3 
11:33:56 db2sysc 4 
... completed ok 

12:11:37 db2sysc 1 

11:56:54 db2sysc 2 
11:41:54 db2sysc 3 
11:33:56 db2sysc 4 
... completed ok 

11:03:13 db2sysc 5 
11:26:23 db2sysc 6 
11:24:53 db2sysc 7 
11:17:53 db2sysc 8 
... completed ok 

11:03:13 db2sysc 5 
11:26:23 db2sysc 6 
11:24:53 db2sysc 7 
11:17:53 db2sysc 8 
... completed ok 

11:03:13 db2sysc 5 
11:26:23 db2sysc 6 
11:24:53 db2sysc 7 
11:17:53 db2sysc 8 
... completed ok 

11:03:13 db2sysc 5 
11:26:23 db2sysc 6 
11:24:53 db2sysc 7 
11:17:53 db2sysc 8 
... completed ok 
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6.1.2 Formatting the command output 


When there are multiple lines of output per node involved, the output displayed 
can be “busy” and cannot be sorted easily in alphanumeric order by node name. 

Figure 6-8 shows a standard vmstat output using the dsh utility. The vmstat 
command reports on each node but not in sequence. For example, node 
ISAS56R1D5 is reported on before ISAS56R1D4, and ISAS56R1D2 is reported 
between ISAS56R1D4 and ISAS56RD3. The individual vmstat report fields do 
not line up well with one another and the headers are repeated for every two lines 
of output, which adds to the amount of unnecessary characters on the screen. 
Most important of all, for every useful line per node of information, dsh displays 
four total lines per node on the screen. This prevents you from seeing all the 
nodes on the screen when you have many nodes on your system. 

Example 6-8 Standard vmstat output using the dsh utility 


ISAS56MGMT: # dsh -a 'vmstat 1 2' 

ISAS56R1D5: procs memory yjmadi “-e-swap— -:-ir*-io -system— cpu 

ISAS56R1D5: r b swpd free buff cache si so bi bo in cs us sy id wa st 

ISAS56R1D5: 0 0 148 61028836 3272800 403180 0 0 367 106 0 0 1 0 98 1 0 

ISAS56R1D5: 0 0 148 61029012 3272800 403180 0 0 0 0 284 670 0 0 100 0 0 

ISAS56R1D4: procs memory — swap-- io -system-- cpu 

ISAS56R1D4: r b swpd free buff cache si so bi bo in cs us sy id wa st 

ISAS56R1D4: 00 0 61516076 2933116 352372 0 0 369 105 0 0 1 0 98 1 0 

ISAS56R1D4: 00 0 61516240 2933116 352372 0 0 0 12 285 678 0 0 100 0 0 

ISAS56R1D2: procs memory — swap-- io -system-- cpu 

ISAS56R1D2: r b swpd free buff cache si so bi bo in cs us sy id wa st 

ISAS56R1D2: 9 69 0 16172928 1804884 41932404 0 0 1241 246 0 1 2 0 95 3 0 

ISAS56R1D2: 13 61 0 16174512 1804884 41932404 0 0 487232 29264 35347 155873 52 9 1 38 0 

ISAS56R1D3: procs memory — .-a-— i~— swap— — — -io -system— cpu 

ISAS56R1D3: r b swpd free buff cache si so bi bo in cs us sy id wa st 

ISAS56R1D3: 3 74 0 43792620 1634324 15122076 0 0 1216 244 0 0 2 0 95 3 0 

ISAS56R1D3: 17 57 0 43794452 1634324 15122076 0 0 420624 14416 31996 149589 38 6 1 55 0 


Formatting the output can help spot the problem when running performance 
troubleshooting commands in parallel. Example 6-9 shows an example of 
formatting the vmstat command output. The command is saved as an alias 
savmstat for convenience which can be rerun later. 

Example 6-9 Saving a more complex command as a reusable alias command 

ISAS56MGMT # vi ,bash_profile (add alias command to end of file and save) 


alias savmstat="echo ' '"vmstat 1 | head -l~;dsh -a 'vmstat 1 2 | tail -1' | sort" 

ISAS56MGMT:- # savmstat 

procs memory — swap-- io -system-- cpu 


ISAS56R1D2: 1 0 0 20396720 1869112 41451836 0 0 0 36 292 747 0 0 100 0 0 

ISAS56R1D3: 0 0 0 47966484 1697744 14776984 0 0 0 0 273 748 0 0 100 0 0 

ISAS56R1D4: 00 0 61512632 2934180 353364 0 0 0 0 284 661 0 0 100 0 0 

ISAS56R1D5: 0 0 148 61029704 3273744 402236 0 0 0 0 261 603 0 0 100 0 0 
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For more complex output formatting, filtering, reordering, summarizing the 
information at top of screen, and merging output of two or more commands, you 
can use custom script. As an example, we provide a Perl script, sa_cpu_mon.pl , 
that formats the vmstat output for monitoring CPU resource. This script combines 
the relevant CPU-related elements from both the vmstat and uptime commands, 
computes system-wide averages, and displays the results in node name order for 
all physical nodes. You can use this Perl script for both Linux- and AlX-based IBM 
Smart Analytics System offerings. 

Example 6-10 shows a sample output of sa_cpu_mon with CPU performance from 
a system-wide glance at all nodes in parallel and system average statistics 
summarized at the top. The information is parsed and reformatted into an 
easy-to-view display. 

Example 6-10 Script formatted vmstat with load averages 


— Load Average 

lmin 5mins 15mins 


System Avg: 1.2 2.2 


7.46 9.39 


ISAS56R1D1 

ISAS56R1D2 

ISAS56R1D3 

ISAS56R1D4 

ISAS56R1D5 




0.08 0.07 0.01 
10.86 19.25 23.92 
10.98 17.96 23.00 
0.02 0.01 0.00 
0.00 0.00 0.00 


The following columns are shown: 

► Run Queue: The count of all processes ready to run, running, or waiting to 
run on an available CPU. 

► Block Queue: All processes waiting for data to be returned from I/O before 
they can resume work. 

► CPU: The standard four vmstat CPU columns: 

- usr - The percentage of user (application) CPU usage. 

- sys - The percentage of system (running UNIX kernel code) CPU usage. 

- idle - The percentage of unused CPU. 

- wio - The percentage waiting on an 10 operation to complete. 

► Load Average: The run queue plus block queue columns of the uptime 
command (the running average load for the past 1 minute, 5 minutes and 15 
minutes). That is, the count of all running or runnable tasks plus the count of 
all tasks waiting on I/O. 
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To have a parallel view of I/O resource usage across all the nodes of the IBM 
Smart Analytics System, we provide a Perl script, a_cpu_mon.pl . This script pulls 
relevant l/O-related elements from both the iostat and the /proc/stat system file, 
computes system-wide averages, and displays the results in node name order for 
all physical nodes. 

Another Perl script that we provide for monitoring the memory paging activity is 
sa_paging_mon.pl. This script combines all relevant memory and swap space 
resources and activities. 

For the source code of these scripts, see Appendix A, “Smart Analytics global 
performance monitoring scripts” on page 281 . 


6.2 Performance troubleshooting at the operating 
system level 

When conducting performance troubleshooting, you must first take an overall 
glance at the system resources being used across the data nodes and 
administrations that carry the SQL workload. 

In this section we discuss the performance troubleshooting methods and tools to 
analyze resource issues at the operating system layer. 

6.2.1 CPU, run queue, and load average monitoring 

In this section we go through the process of troubleshooting the CPU resource 
issues across the system, such as high CPU usage, high number of processes in 
the run queue, and the higher-than-normal uptime load averages. To 
troubleshoot CPU resources issues on the IBM Smart Analytics System, check 
the resource from the entire system level, then on the node level, and drill down 
to the process level. 

Checking on the global system level 

Start off by checking the CPU-related resources across the entire IBM Smart 
Analytics System to find the nodes that consume the most CPU resources. 

Here we use the custom tool sa_cpu_mon to check CPU-related resource 
consumption on all nodes: 

$ ./sa_cpu_mon.pl 

Example 6-1 1 shows a snapshot of the CPU usage across all the nodes of an 
IBM Smart Analytics System. 
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Example 6- 1 1 identifying CPU “hot spots” at the node level 


# ./sa_cpu_mon.pl 

sa_cpu_mon Run Block CPU Load Average 

Queue Queue usr sys idle wio lmin 5mins 15mins 


System Avg: 26.6 3.2 20.0 0.2 79.4 0.4 22.84 9.62 6.03 


ISAS56R1D1: 0.0 0.0 0.0 0.0 100.0 0.0 0.24 0.53 0.45 
ISAS56R1D2: 133.0 16.0 99.0 1.0 0.0 0.0 112.23 45.38 27.56 
ISAS56R1D3: 0.0 0.0 1.0 0.0 97.0 2.0 1.70 2.16 2.14 
ISAS56R1D4: 0.0 0.0 0.0 0.0 100.0 0.0 0.00 0.00 0.00 
ISAS56R1D5: 0.0 0.0 0.0 0.0 100.0 0.0 0.01 0.01 0.00 


In this example, the node ISAS56R1 D2 stands out amongst all the nodes as the 
greatest consumer of CPU resources in the entire system. The user application 
takes up 99% CPU time. Because there are 16 CPUs per node, the high run 
queue number means that 117 processes were waiting on an available CPU to 
run on. The high load averages tell us that this is not just a “spike” in the run 
queue, but rather that it has been very high for at least the past minute, and also 
appears to have been higher than the other nodes for at least the past 15 
minutes. 

Also, the load average appears to have been higher in average for the past 1 5 
minutes, higher in average of the past 5 minutes, and higher still in average of the 
past minute. This information seems to indicate that the workload on the system 
has been rising, and the trend indicates that it might continue to do so, so we 
might want to check this out further before it becomes a bigger issue. 

Checking the node level 

After identifying the nodes that have a CPU usage problem, try to isolate which 
process cause the problem using the ps command. The following Linux and AIX 
ps command shows the current top 10 CPU consumers: 

$ ps aux | head -l;ps aux | sort -nr -k3 | head -10 

In our example, the node ISAS56R1 D2 appears to be a “hot node” with higher 
workload and higher CPU resource usage. Example 6-12 shows the top 10 CPU 
consumers on the ISAS56R1D2 node. 


Example 6-12 Identify the current top 1 0 CPU consumers using ps 


ISAS56R1D2 It ps aux | head - 1 ; ps aux | sort -nr -k3 | head -10 


USER PID %CPU %MEM VSZ RSS TTY 

bculinux 28068 643 5.7 13931224 3820832 ? 
bculinux 28101 607 5.8139323803830956? 

bculinux 28078 3.8 5.6136526363695396? 
bculinux 28058 3.8 5.6 13655708 3743236 ? 
root 8524 2.0 0.7 503004 466864 ? 
root 2438 0.8 0.0 0 0 ? 

root 30347 0.6 0.0 9300 1732 ? 

| tail -1' 'uptime' 


STAT START TIME COMMAND 
SI 04:34 17:03 db2sysc 2 

SI 04:34 16:00 db2sysc 4 

SI 04:34 0:06 db2sysc 3 

SI 04:34 0:06 db2sysc 1 

Ssl Augl7 1526:15 /opt/ibm/di rector/agent/. . . 

S< Augl7 631:57 [mpp_dcr] 

Ss 04:37 0:00 bash -c echo 'hostname': 'vmstat 


5 2 
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root 7849 0.5 0.0 0 0 ? RN Augl7 441:24 [kipmiO] 

root 30345 0.3 0.0 41872 2920 ? Ss 04:37 0:00 sshd: rootOnotty 

root 8054 0.2 0.0 224260 20392 ? SI Augl7 221:12 ,/jre/bin/java -Djava.compiler=NONE 

-cp /usr/RaidMan/RaidMsgExt.jar:/usr/RaidMan/RaidMan.jar com. ibm.sysmgt.raidmgr. agent. ManagementAgent 


The UNIX process ID # 28068 is the highest current consumer of CPU resources 
at 643% CPU, the equivalent of 6.43 CPUs (1 00% CPU = 1 CPU) out of a total of 
16 CPUs available on this node. The process ID 28101 is a close second at 
607% CPU, equivalent of 6.07 CPUs. 

Process ID 28068 shows a command of db2sysc 2. In this command, db2sysc is 
the main DB2 engine process, and the number 2 next to it tells us that it is the 
main DB2 process for DB2 logical database partition #2. The command for the 
other high-CPU consuming PID# 28101 is db2sysc 4, indicating it is the main 
DB2 engine process for DB2 logical database partition #4. Because the CPU 
usage of db2sysc 3 (PID# 28078) and db2sysc 1 (PID# 28058) is very low and 
there is no CPU usage on other physical nodes, this seems to indicate that all the 
SQL activity is concentrated on logical partitions 2 and 4 exclusively. This 
situation might potentially indicate a data skew issue, with the data for a specific 
table being concentrated on very few database partitions instead of being spread 
out evenly across all the database partitions. 

Alternatively, you can use the top (Linux) or topaz (AIX) commands to list the top 
current CPU consumers. 


Example 6-13 show the output of the top command that confirms what we 
discovered in the Example 6-12 on page 138, that is, the process ID 28068 and 
28101 represent the lion’s share of the CPU resource consumption. 

Note that the COMMAND field of the top output does not provide the logical 
partition number. You cannot tell which logical database partition these db2sysc 
DB2 engine processes are associated with. In this case, use the ps command. 


Example 6-13 Using top to identify the top CPU consuming processes 


top - 04:39:36 up 52 days, 19:32, 4 users, load average: 104.22, 59.18, 24.40 
Tasks: 260 total, 11 running, 249 sleeping, 0 stopped, 0 zombie 
Cpu(s) : 99.3%us, 0.3%sy, 0.0%ni, 0.0%id, 0.1%wa, 0.0%hi, 0.3%si, 0.0%st 
Mem: 65981668k total, 37876324k used, 28105344k free, 1727960k buffers 
Swap: 33559744k total, 0k used, 33559744k free, 34399044k cached 


PID USER 
28068 bculinux 
28101 bculinux 

28058 bculinux 
28078 bculinux 
2438 root 



PR NI VIRT 
20 0 13. 3g 

25 0 13. 3g 

24 0 13. Og 

25 0 13. Og 
0 -20 0 

15 0 219m 

16 0 796 

RT 0 0 

34 19 0 

RT 0 0 


S %CPU %MEM TIME+ COMMAND 
18:15.31 db2sysc 
10:53.33 db2sysc 
0:08.61 db2sysc 
0:08.58 db2sysc 
1:58.50 mpp_dcr 
1:12.68 java 
0:13.89 i nit 
1:00.29 migration/0 
1:50.35 ksofti rqd/0 
0:00.21 migration/1 
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Another method for listing top 10 processes consuming CPU resources is 
pidstat. This command is for Linux only: 

$ pidstat | head -3; pidstat 2 1 | egrep -v -i 1 average | Linux 1 | sort -nr -k 5 | head 

Example 6-14 shows how the alternative pidstat command can be used to list 
the top 10 CPU consuming processes. Similar to the top command, pidstat 
does not show the numeric logical database partition number next to the 
description of db2sysc. 

Example 6-14 Determine top 10 CPU consuming processes pidstat 


ISAS56R1D2:~ # pidstat | head -3; pidstat 2 1 | egrep -v -i ' average | Linux' | sort -nr -k 5 | head 

Linux 2.6.16.60-0.21-smp (ISAS56R1D2) 10/10/10 


30720 1254. 
30687 330. 
30697 
30650 


r %system %CPU 


ISAS56R1D2:- # 


PID %user %system %CPU 


CPU Command 

14 db2sysc 

15 db2sysc 
10 db2sysc 
10 db2sysc 
12 syslog-ng 

7 java 
0 mpp_dcr 
12 klogd 
2 pidstat 
CPU Command 


In certain troubleshooting cases, it might be of interest to see which processes 
have been using up a lot of CPU resources over their “lifetime”. This information 
can tell you which processes are often consumers of a lot of CPU resources not 
just at this time but historically. The following commands show how to identify the 
top 10 cumulative CPU consumers: 

► Linux: 

ps aux | head - 1 ; ps aux | sort -nr -klO | head -10 

► AIX: 

ps aux | head -l;ps aux | sort -nr -kll | head -10 

Example 6-15 shows an output of the top 10 cumulative CPU consumers on our 
Linux system. Note that the db2sysc 2 (process ID 28068) appears in the 
“historical” top 10 CPU consumer list, but it is not yet close to the top of the 
historical list. This might mean that it is a recently started process, and that it only 
made the historical top 10 list because its high current CPU usage. It was higher 
in CPU use compared to most other processes that have run on the system for a 
much longer time. The START column confirms that this process was started just 
today, at 04:34 AM. All the other processes in this top 10 historical list date back 
to August 17th. 
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Example 6-15 Linux top 10 cumulative CPU consumers using ‘ps’ 


ISAS56R1D2:~ it ps aux | head -l;ps aux 
USER PID %CPU %MEM VSZ RSS TTY 

root 8524 2.0 0.7 503004 466864 ? 

root 2438 0.8 0.0 0 0 ? 

root 7849 0.5 0.0 0 0 ? 

root 8054 0.2 0.0 224260 20392 ? 

-cp /usr/RaidMan/RaidMsgExt.jar:/usr/RaidMan, 
root 27 0.0 0.0 0 0 ? 

root 13 0.0 0.0 0 0 ? 

bculinux 28068 809 6.2 13936552 4109412 ? 
root 11 0.0 0.0 0 0 ? 

root 29 0.0 0.0 0 0 ? 

root 17 0.0 0.0 0 0 ? 


sort -nr -klO | head -10 

STAT START TIME COMMAND 

Ssl Augl7 1526:15 /opt/ibm/di rector/agent/. . . 

S< Augl7 631:59 [mppdcr] 

SN Augl7 441:24 [kipmiO] 

SI Augl7 221:12 ,/jre/bin/java -Djava.compiler=N0NE 

aidHan. jar com. ibm.sysmgt.raidmgr. agent. ManagementAgent 
SN Augl7 54:54 [ksoftirqd/12] 

SN Augl7 52:28 [ksoftirqd/5] 

SI 04:34 49:46 db2sysc 2 
SN Augl7 43:37 [ksoftirqd/4] 

SN Augl7 41:24 [ksoftirqd/13] 

SN Augl7 31:04 [ksoftirqd/7] 


Checking the process level 

You can further identify which thread is consuming the most CPU on the specific 
node by using the ps command. The following commands are used to list the top 
10 CPU consuming threads of a given UNIX process ID: 

► Linux: 

$ ps -Lm -F | head - 1 ; ps -Lm -F -p <PID> | sort -nr -k 5 | head -10 

► AIX: 

$ ps -mo THREAD | head -1; ps -mo THREAD -p <PID>| sort -rn -k 6 | 
head -10 


THREAD is presented as a light weight process (LWP) on Linux and thread 
identification (TID) on AIX. 

In the last section, we identified the DB2 db2sysc 2 process (ID 28068) for logical 
database partition 2 has high CPU usage. Example 6-16 lists the top 10 CPU 
consuming threads of this process. The output shows that threads 29887, 29800, 
and 29746 in LWP column use 23% CPU each (equivalent of 0.23 of one CPU, 
out of 16 available CPUs on the system), with many others very close to the 
same CPU consumption. Because these threads are children of a DB2 process, 
you can then cross-reference them by thread ID in DB2 to determine what SQL 
or DB2 utility they are actually performing. 


Example 6-16 List current top 10 CPU consuming threads using ps 


ISAS56R1D2:™ it ps -Lm -F | head -l;ps -Lm -F -p 28068 | sort 
UID PID PPID LWP C NLWP SZ RSS PSR STIME TTY 
bculinux 28068 28065 - 99 138 3482806 3889452 - 04:34 ? 


bculinux 

bculinux 

bculinux 


- 29887 23 

- 29800 23 

- 29746 23 

- 29861 22 

- 29838 22 

- 29766 22 


15 04:35 - 
14 04:35 - 
14 04:35 - 

6 04:35 - 
10 04:35 - 
14 04:35 - 


-k 5 | head -10 
TIME CMD 

00:24:42 db2sysc 2 

00:00:41 - 
00:00:42 - 
00:00:41 - 

00:00:39 - 
00:00:40 - 
00:00:39 - 
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bcul inux 
bcul inux 
bcul inux 


- 29912 21 

- 29884 21 

- 29744 20 


0 04:35 - 
0 04:35 - 
0 04:35 - 


00:00:37 - 
00:00:37 - 
00:00:36 - 


On Linux, an alternative method to determine the top 10 CPU consuming threads 
of a specific process is as follows: 

pidstat -t | head -3;pidstat -p 10302 -t 2 1 | egrep -v -i 
'average | Linux' | sort -nr -k 6 | head -10 

Example 6-17 on page 142 shows this alternative method to list the top 10 
threads of a specific process, PID 10302. Later we can track these threads (TID) 
in the DB2 tools to determine what they are actually doing. 

Example 6-17 Using pidstat to show thread CPU usage 


ISAS56R1D2:~ # pidstat -t | head -3;pidstat -p 10302 -t 2 1 | egrep -v -i 'average|Linux' | sort -nr 
Linux 2.6.16.60-0.21-smp (ISAS56R1D2) 10/10/10 


05:32:32 

05:32:34 

05:32:34 

05:32:34 

05:32:34 

05:32:34 

05:32:34 

05:32:34 

05:32:34 

05:32:34 

05:32:34 


PID 

10302 


TID %user %system %CPU 


- 1173.63 
12283 76.12 
12183 66.17 
12151 60.70 
12282 57.71 
12020 53.73 
12106 51.24 
12098 48.76 
12185 45.77 
12117 45.27 


2.49 1176.12 
0.00 76.12 
0.00 66.17 
0.00 60.70 
0.00 57.71 
0.00 53.73 
0.00 51.24 
0.00 48.76 
0.00 45.77 
0.00 45.27 


CPU Command 

10 db2sysc 

4 db2sysc 

7 db2sysc 

5 db2sysc 

2 db2sysc 

3 db2sysc 

11 db2sysc 

1 db2sysc 

10 db2sysc 

9 db2sysc 


6.2.2 Disk I/O and block queue 

In this section, we discuss how to troubleshoot the performance of disk I/O. 

Identifying the most I/O consuming nodes 

Start off by checking the l/O-related resources (I/O wait percentage, the block 
queue, and the percentage of the rolled-up bandwidth utilization of devices) 
across the entire system for all physical nodes. 

Here we use the custom Perl script sa_io_mon.pl to check the I/O resource 
consumption on a global system level: 

$ ./sa_i o_mon.pl 
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Figure 6-1 shows a sample output of sa_io_mon. 



Figure 6-1 sa_io_mon output 


Example 6-18 is an excerpt of Figure 6-1 showing the columns of interest. The 
ISAS56R1 D3 node appears to be working the hardest with respect to I/O. In the 
Block Queue column, there are 21 processes showing blocked waiting on I/O. 
The wio (Waiting on I/O) CPU column is at 54.28%, and there are four disk 
devices running at a high percentage of utilization (two between 60-90% and two 
near saturation between 90-100%). All these figures are noticeably higher on 
node ISAS56R1 D3 compared with the rest of the nodes on the system. Hence 
we have to drill down at the node level on ISAS56R1 D3 to figure out why this 
situation is occurring. 


Example 6-18 Using sa_io_mon to check I/O consumption 


./sa_i o_mon.pl 

- CPU - 

Block Tot 



System Avg: 4.2 


ISAS56R1D1: 0.0 
ISAS56R1D2: 0.0 
ISAS56R1D3: 21.0 
ISAS56R1D4: 0.0 
ISAS56R1D5: 0.0 


11.02 


0.15 

0.65 

54.28 

0.00 

0.02 


Avg/dev 
%ut1 1 


10 Device Usage 

#Active - Nbr devices 


30-60% 


60-90% 90-100% 


13 6.73 3.6 2.8 0.0 0.4 0.4 


25 0.15 4.0 4.0 0.0 0.0 0.0 
10 1.05 6.0 6.0 0.0 0.0 0.0 
10 32.34 4.0 0.0 0.0 2.0 2.0 
10 0.02 2.0 2.0 0.0 0.0 0.0 
10 0.07 2.0 2.0 0.0 0.0 0.0 


Checking I/O consumption at node level 

On the node level, check the I/O consumption both process and device: 

► Process or thread view: 

Check which processes and threads are consuming the most I/O, and which 
devices they are accessing: 

- On Linux, use dmesg -c 

- On AIX, use ps 

► File or device view: 

Check which devices are experiencing the heaviest I/O load with iostat 1 5. 
After identifying which device is getting the heaviest I/O hits by specific or all 
processes, try to get more specific information as to which logical volume and 
which file system is involved. We provide scripts, disk2fs.ksh and fs2disk.ksh 
to aid in this device-to-file system mapping. See Appendix A, “Smart Analytics 
global performance monitoring scripts” on page 281 for the source code. 
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Example 6-19 shows the result of using iostat -k -x 5 on node ISAS56D1 R3 to 
identify the disk devices that are showing the high I/O activity. The drives with the 
most I/O activity are sdc, sde, dm-0, and dm-2 at 100% I/O utilization. That is 
actually four devices saturated at 100%, not two as shown in Example 6-18. 

Example 6-19 Identify the disk devices with high I/O activity 


ISAS56R1D3: # iostat -k -x 5 

avg-cpu: %user %nice %system %iowait %steal %idle 

26.74 0.00 2.50 57.71 0.00 13.05 




sdc 

sdd 


dm- 7 
sdf 


rrqm/s wrqm/s r/s 

0.00 0.60 0.00 

32.00 0.60 297.40 

61.80 1.00 3704.40 

32.20 0.60 297.00 

76.40 1.60 837.60 

0.00 0.00 914.00 

0.00 0.00 330.40 

0.00 0.00 3766.60 

0.00 0.00 329.40 

0.00 0.00 0.00 

0.00 0.00 0.00 

0.00 0.00 0.00 

0.00 0.00 0.00 

0.00 0.00 0.00 


w/s rkB/s 
0.60 0.00 
0.40 132972.80 
0.40 263929.60 
0.40 133577.60 
0.40 313238.40 

2.00 313238.40 

1.00 134089.60 
1.40 263993.60 
1.00 132972.80 
0.00 0.00 
0.00 0.00 
1.20 0.00 
0.00 0.00 
0.00 0.00 


wkB/s avgrq-sz avgqu-sz 
5.60 18.67 0.00 

4.80 893.07 2.24 

6.40 142.48 11.23 

4.80 898.33 2.32 

8.80 747.61 5.42 

8.00 683.94 5.80 

4.00 809.26 2.47 

5.60 140.13 11.56 

4.00 804.94 2.37 

0.00 0.00 0.00 

0.00 0.00 0.00 

4.80 8.00 0.00 

0.00 0.00 0.00 

0.00 0.00 0.00 


await svctm %ut11 

0.00 0.00 0.00 

7.53 2.21 65.92 

3.03 0.27 100.00 

7.78 2.21 65.60 

6.47 1.19 100.00 

6.34 1.09 100.00 

7.43 1.98 65.60 

3.07 0.27 100.00 

7.19 2.00 65.92 

0.00 0.00 0.00 

0.00 0.00 0.00 

0.00 0.00 0.00 

0.00 0.00 0.00 

0.00 0.00 0.00 


Using the lookup script disk2fs.ksh, Example 6-20 shows which file systems map 
to a specific disk device. We see that many of the devices are actually the 
synonyms for the same disk storage. For example, sdc and dm-2 are really the 
same device (both mapped to file system /db2fs/bculinux/NODE0006), and so 
are sde and dm-0 (both mapped to file system /db2fs/bculinux/NODE0008). The 
sa_io_mon.pl has filtered out the redundant statistics to avoid double counting. 

Example 6-20 disk2fs.ksh shows which file system maps to a given disk 


ISAS56R1D3: 1/0 device sdb — > filesystem mountdir: /db2fs/bcul inux/N0DE0005 (LV: 
/dev/vgdb2N0DE0005/l vdb2N0DE0005) 

ISAS56RlD3:/home/pthoreso # ,/di sk2fs. ksh dm-3 

ISAS56R1D3: 1/0 device dm-3 — > filesystem mountdir: /db2fs/bculinux/N0DE0005 (LV: 
/dev/vgdb2N0DE0005/l vdb2N0DE0005) 

ISAS56R1D3: # ,/disk2fs.ksh sdc 

ISAS56R1D3: 1/0 device sdc — > filesystem mountdir: /db2fs/bcul inux/N0DE0006 (LV: 
/dev/vgdb2N0DE0006/l vdb2N0DE0006) 

ISAS56R1D3:# ,/disk2fs.ksh dm-2 

ISAS56R1D3: 1/0 device dm-2 — > filesystem mountdir: /db2fs/bcul inux/N0DE0006 (LV: 
/dev/vgdb2N0DE0006/l vdb2N0DE0006) 

ISAS56RlD3:/home/pthoreso # ,/disk2fs.ksh sdd 

ISAS56R1D3: 1/0 device sdd — > filesystem mountdir: /db2fs/bcul inux/N0DE0007 (LV: 
/dev/vgdb2N0DE0007/l vdb2N0DE0007) 

ISAS56RlD3:/home/pthoreso # ,/disk2fs.ksh dm-1 

ISAS56R1D3: 1/0 device dm-1 — > filesystem mountdir: /db2fs/bculinux/N0DE0007 (LV: 
/dev/vgdb2N0DE0007/l vdb2N0DE0007) 

ISAS56R1D3 : /home/pthoreso # ./disk2fs.ksh sde 

ISAS56R1D3: 1/0 device sde — > filesystem mountdir: /db2fs/bcul inux/N0DE0008 (LV: 
/dev/vgdb2N0DE0008/l vdb2N0DE0008) 
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ISAS56RlD3:/home/pthoreso # ,/disk2fs.ksh dm-0 

ISAS56R1D3: I/O device dm-0 — > filesystem mountdir: /db2fs/bcul inux/N0DE0008 (LV: 
/dev/vgdb2N0DE0008/l vdb2N0DE0008) 


Now we know which of the disks are running the “hottest” at 1 00% utilization and 
what file system they correspond to: sdc/dm-2 = /db2fs/bculinux/NODE0006 
(for logical database partition 6) and sde/dm-0=/db2fs/bculinux/NODE0008 
(logical database partition 8). 

The other busy devices are sdb/dm-3=/db2fs/bculinux/NODE0005 file system for 
logical database partition 5, and sdd/dm-1=/db2fs/bculinux/NODE0007 file 
system for logical database partition 7, both at 65%. 

Because the I/O is only in the 0-30% range for the other physical nodes, this 
shows that the I/O activity is not evenly spread: database partitions 6 and 8 at 
100% utilization, logical database partitions 5 and 7 at 65%, and all the other 
nodes at less than 30%. This situation might indicate a case of database data 
skew across the logical database partitions with logical database partitions 6 and 
8 having the most data for a given table, logical database partitions 5 and 7 the 
next most, and much less in all the other database partitions. This information 
can be confirmed at the DB2 level. 

Checking I/O consumption at the hardware layer 

Hardware problems also can cause performance issues. In this section we show 
how to map a disk device, LUN, and controller. 

To check which corresponding hardware LUNs are involved, use the following 
command to see the disk device to LUN mapping: 

► Linux: /opt/mpp/lsvdev 

► AIX: mpiogetconfig -Av 

Example 6-21 shows the LUN to disk device mapping of our “hot I/O” node 
ISAS56R1 D3. LUN1 for storage array Storage03 maps to disk device sdc, which 
corresponds to file system /db2fs/bculinux/NODE0006 for logical database 
partition 6, and LUN1 for array Storage04 maps to disk device sde corresponding 
to file system /db2fs/bculinux/NODE0008. 

Example 6-21 Isvdev shows the hardware LUN to disk device mapping (Linux only) 

ISAS56RlD3:/home/pthoreso # /opt/mpp/lsvdev 
Array Name Lun sd device 


Storage03 0 

Storage03 1 
Storage04 0 
Storage04 1 


-> /dev/sdb 

-> /dev/sdc 
-> /dev/sdd 
-> /dev/sde 
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To find the mapping between the hardware LUN array and the storage controller 
and path, use the mppUtil -S command. 


Example 6-22 shows the controller to LUN array mapping for the array storage 
Storage03 and Storage04 identified in Example 6-21 . 


Example 6-22 mppUtil maps the controller to LUN array mappings 

ISAS56R1D3: # mppUtil -S 

H9C0T2 Active Active Storage03 

H5C0T2L000 Up H6C0T2L000 Up 

H7C0T2L000 Up H8C0T2L000 Up 

H5C0T2L001 Up H6C0T2L001 Up 

H7C0T2L001 Up H8C0T2L001 Up 

H9C0T3 Active Active Storage04 

H5C0T3L000 Up H6C0T3L000 Up 

H7C0T3L000 Up H8C0T3L000 Up 

H5C0T3L001 Up H6C0T3L001 Up 

H7C0T3L001 Up H8C0T3L001 Up 


Example 6-23 shows how to interpret the mppUtil -s command output. The 
output shows the following conditions: 

► The path for the two controllers (A & B controllers) 

► Which host, channel, and target make up the path to the hardware LUN 
arrays 

► Which LUN arrays are up or down 


Example 6-23 mppUtil man page 

ISAS56R1D3 : /home/pthoreso # man mppUtil 
Decoding the output of mppUtil -S 



H6C0T1 Offline Active ausctlr_34 

H2C0T4L000 Up 
H2C0T4L001 Up 
H2C0T4L003 Up 
H2C0T4L004 Up 
H2C0T4L004 Up 

H6C0T0 Active Active MPP_Yumal 

H2C0T2L000 Up H2C0T0L000 Up 

H2C0T3L000 Up H2C0T1L000 Up 

H2C0T2L001 Up H2C0T0L001 Up 

H2C0T3L001 Up H2C0T1L001 Up 

H2C0T2L088 Up H2C0T0L088 Up 

H2C0T3L088 Up H2C0T1L088 Up 
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To drill down further for the LUN, controller and pathway information, use the 
mppUtil -a <array name> command. 

Example 6-24 shows the details of array Storage03. Note that there are multiple 
redundant pathways to the LUN array: through either of the two controllers 
(A and B) and two paths for each controller. This output shows that LUN1 
corresponding to disk sdc has a LUN identifier (WWN) 
600a0b80006771ae0000082e4c3f8580. The controllers A and B are with the 
following pathways: 

► Controller A: Path#1: hostld: 5, channelld: 0, targetld: 2, and Path#2: hostld: 

7, channelld: 0, targetld: 2. 

► Controller B: Path#1: hostld: 6, channelld: 0, targetld: 2, and Path#2: hostld: 

8, channelld: 0, targetld: 2. 


Example 6-24 Show LUN, controller and pathway information 


ISAS56RlD3:/home/pthoreso # mppUtil -a Storage03 | more 
Hostname = ISAS56R1D3 

Time = GMT 10/11/2010 10:12:46 

MPP Information: 


SingleController: N 
ScanTriggered: N 
AVTEnabled: N 
RestoreCfg: N 
Page2CSubPage: Y 


Controller 'A' Status: 


ModuleHandle: 

FirmwareVersion: 

ScanTaskState: 

LBPolicy: 



>773cb000000004bf40c6c 


ControllerHandle: none 
UTMLunExists: Y (031) 
NumberOf Paths: 2 


ControllerPresent: Y 
Failed: N 
Fail overlnProg: N 
ServiceMode: N 


Path #1 


DirectoryVertex: present Present: Y 

PathState: OPTIMAL 

Pathld: 77050002 (hostld: 5, channelld: 0, targetld: 2) 

Path #2 


DirectoryVertex: present Present: Y 

PathState: OPTIMAL 

Pathld: 77070002 (hostld: 7, channelld: 0, targetld: 2) 

Controller 'B' Status: 


ControllerHandle 

UTMLunExists 

NumberOfPaths 


ControllerPresent: Y 
Failed: N 
Fail overlnProg: N 
ServiceMode: N 
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PathState: OPTIMAL 

Pathld: 77060002 (hostld: 6, channel Id: 0, targetld: 2) 
Path #2 


DirectoryVertex: present Present: Y 

PathState: OPTIMAL 

Pathld: 77080002 (hostld: 8, channel Id: 0, targetld: 2) 


Lun #1 - WWN: 600a0b80006771ae0000082e4c3f8580 


LunObject: present 
RemoveEl igible: N 
NotConfigured: N 

DevState: OPTIMAL 


Controller 'A' Path 


CurrentOwningPath: B 
BootOwningPath: B 
PreferredPath: B 
ReportedPresent: Y 
ReportedMissing: N 
NeedsReservationCheck: N 
TASBitSet: Y 
NotReady: N 

Quiescent: N 


NumLunObjects: 2 RoundRobinlndex: 

Path #1: LunPathDevice: present 
DevState: OPTIMAL 

RemoveState: 0x0 StartState: Oxl PowerState: 0x0 
Path #2: LunPathDevice: present 
DevState: OPTIMAL 

RemoveState: 0x0 StartState: Oxl PowerState: 0x0 


0 


Controller 


Path 


NumLunObjects: 2 

Path #1: LunPathDevice: present 
DevState: OPTIMAL 
RemoveState: 0x0 StartState: 
Path #2: LunPathDevice: present 
DevState: OPTIMAL 
RemoveState: 0x0 StartState: 


RoundRobinlndex: 0 
PowerState: 0x0 
PowerState: 0x0 


6.2.3 Memory usage 

Memory over-allocation that results in paging is a commonly seen performance 
degradation indicator. In this section, we discuss how to check for nodes 
consuming the most memory resources for the IBM Smart Analytics System. 

Start your monitoring from the global system level. We use the custom script 
sa_paging_mon.pl to check the entire system. Figure 6-2 shows an output of 
custom script sa_paging_mon.pl with page swapping, real memory usage, and 
swap space usage information from an IBM Smart Analytics System 5600. 
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Figure 6-2 Paging monitoring 


For readability, we split the display in half as shown in Example 6-25. There is no 
swapping currently occurring on this system. However, if there was, we can 
notice it in the Page Swapping columns for pages swapped in and pages 
swapped out. Simply monitor for any excessive activity in this area, the free 
memory decreasing excessively, and the swap space used actually increasing 
from the usual zero normally seen on an IBM Smart Analytics System. 

Example 6-25 Formattedsa_paging_mon.pl output 


sa_paging_mon Run Block 
Queue Queue 


CPU 




-- Page Swapping -- 


System Avg: 

ISAS56R1D1: 

ISAS56R1D2: 

ISAS56R1D3: 

ISAS56R1D4: 

ISAS56R1D5: 


Total 



65981668 23395962 42585705 33559744 


29 33559714 


65981668 

65981668 

65981668 

65981668 

65981668 


17371268 

45034408 

45132744 

4479436 

4961956 


48610400 

20947260 

20848924 

61502232 

61019712 


33559744 

33559744 

33559744 

33559744 

33559744 


0 33559744 

0 33559744 

0 33559744 

0 33559744 

148 33559596 


If you see any abnormal paging activity on a particular node, check the process 
on the node that consumes the most real memory (RSS) using the ps aux 
command. Following are commands to show the top 10 real memory consuming 


► Linux: ps aux | head -1; ps aux | sort -nr -k 6 | head 

► AIX: ps aux | head -1; ps aux | sort -nr -k 5 | head 

Example 6-26 shows an output of the ps aux command on our IBM Smart 
Analytics System 5600 VI . Here the main DB2 engine processes db2sysc (one 
per logical database partition) are the processes using the most real memory on 
the system (a little over 3.5 GB of memory for each of the four processes). All 
other processes are consuming at least an order of magnitude less than them. 
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If paging was occurring, we want to look at these four DB2 processes to see why 
they are using so much memory and if something can be done to use less 
memory until the paging stops. 


Example 6-26 Using ps aux to determine top 10 real memory consuming processes 


ISAS56R1D3: 


aux | h 


51 06:10 14:05 db2sysc 8 

51 06:10 11:19 db2sysc 6 

SI 06:10 0:05 db2sysc 5 

51 06:10 0:05 db2sysc 7 

Ssl Augl7 1553:40 


RSS TTY 

bculinux 30214 640 5.8139313563845884? 
bculinux 30181 510 5.7139302723795332? 
bculinux 30168 4.1 5.6136588363744564? 
bculinux 30204 4.3 5.5136516123685304? 
root 8524 1.9 0.7 503004 466216 ? 

/opt/ibm/di rector/agent/ jvm/j re/bi n/java -Xmx384m -XminfO.Ol -Xmaxf0.4 
-Dsun.rmi .dgc. cl ient. gclnterval =3600000 -Dsun.rmi .dgc. server. gclnterval =3600000 
-Xbootclasspath/a:/opt/ibm/director/agent/runtime/core/rcp/eclipse/plugins/com.ibm.rcp.base_6.1.2.200 
801281200/rcpbootcp. jar: /opt/ibm/di rector/agent/1 ib/icl . jar: /opt/ibm/di rector/ agent/1 ib/jaas2zos. jar: 
/opt/ibm/director/agent/lib/jaasmodule.jar:/opt/ibm/director/agent/lib/lwi native. jar: /opt/ibm/di recto 
r/agent/lib/lwirolemap.jar:/opt/ibm/director/agent/lib/passutils.jar: . ./. ,/runtime/agent/l ib/cas-boot 
cp.jar -Xverify:none -cp 

irector/agent/runtime/core/rcp/eclipse/plugins/com.i 

bm.lwi .LaunchLWI 
Sep29 0:00 ba 


lipse/launch.j; 


:ecl ipse/startup, jar :/opt/i bi 
.200801281200/launcher. jar coi 
.0 0.1 100156 92688 ? 


'hostname' :'vmstat 1 


root 30212 C 
root 30178 C 
ISAS56R1D3: # 


0.0 9186600 34080 ? 
0.0 9186600 34064 ? 
0.0 9186596 34060 ? 
0.0 9186600 34060 ? 


SI 


0:00 db2wdog 5 
0:00 db2wdog 7 
0:00 db2wdog 8 
0:00 db2wdog 6 


Drilling down to the thread ID level is not relevant for this type of resource 
because all children threads show the same memory usage as their parent PID 
process. 


6.2.4 Network 

To check status, the number of packets RX/TX, and the number of bytes RX/TX 
for networks, use netstat and ifconfig. 

Example 6-27 shows an output of the ifconfig command. 

Example 6-27 ifconfig 


$ ifconfig bondO | egrep 1 RX | TX 1 ; sleep 2; ifconfig bondO | egrep ‘ RX | TX 1 
RX packets:712683689 errors:0 dropped:0 overruns:0 frame:0 
TX packets:972731266 errors:0 dropped:0 overruns:0 carrier:0 
RX bytes : 584318299554 (557249.3 Mb) TX bytes =946042644765 (902216.5 Mb) 
RX packets:712683755 errors:0 dropped:0 overruns:0 frame:0 
TX packets:972731336 errors:0 dropped:0 overruns:0 carrier:0 
RX bytes: 584318308198 (557249.3 Mb) TX bytes:946042691841 (902216.6 Mb) 


You can use the netstat command to check the network configuration and 
activity. Example 6-28 shows an output of netstat -i . 
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Example 6-28 netstat -i 


$ netstat -i 
Name Mtu Network 
enO 1500 link#2 
enO 1500 10.199.67 
enll 9000 link#3 
enll 9000 10.199.66 
enl2 9000 link#4 
enl2 9000 10.199.64 

enl2 9000 10.199.64 

loO 16896 1 ink#l 
loO 16896 127 
loO 16896 : : 1 


Address 

0.21.5e.79.5d.60 

ISAS56RlADMmgt 

0.21.5e.89.23.7f 

ISAS56RlADMcorp 

0.21.5e.89.23.7e 

ISAS56RlADMapp 

VISAS56RlADMapp 

loopback 


ZonelD Ipkts Ierrs 

- 15728584 0 

- 15728584 0 

- 2509667120 0 

- 2509667120 0 

- 277706206 0 

- 277706206 0 

- 277706206 0 

- 451974151 0 

- 451974151 0 

0 451974151 0 


Opkts Oerrs 
12039746 0 
12039746 0 
2500771760 3 
2500771760 3 
153448153 3 
153448153 3 
153448153 3 
448999647 0 
448999647 0 
448999647 0 


The netstat -s command shows the statistics for each protocol. Example 6-29 
shows an excerpt the netstat -s command output. 

Example 6-29 netstat -s 


$netstat -s 

3154479 calls to icmp_error 

0 errors not generated because old message was icmp 
Output histogram: 

echo reply: 3042 

destination unreachable: 3154479 
echo: 67 

information request reply: 1 
0 messages with bad code fields 
0 messages < minimum length 
0 bad checksums 
0 messages with bad length 
Input histogram: 

echo reply: 76 

destination unreachable: 3153787 
echo: 3039 

address mask request: 153 
3039 message responses generated 

igmp: 

0 messages received 

0 messages received with too few bytes 
0 messages received with bad checksum 
0 membership queries received 

0 membership queries received with invalid field(s) 

0 membership reports received 
0 membership reports received with invalid field(s) 

0 membership reports received for groups to which we belong 
10 membership reports sent 
tcp: 

3089466626 packets sent 

488366659 data packets (3898579545 bytes) 

217 data packets (2566776 bytes) retransmitted 
1082199945 ack-only packets (8357336 delayed) 

2 URG only packets 
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6.3 DB2 Performance troubleshooting 


In the previous section, we reviewed operating system commands and methods 
to check for CPU, memory, I/O, and network bottlenecks on the system. In this 
section, we discuss how to check the usage of these resources from a DB2 
perspective. 

We discuss the most common scenarios that consist of identifying the entire 
workload of the SQL statement or utility consuming resources such as CPU, 
memory, I/O, and network. For the memory resources, we examine how to review 
the overall DB2 memory usage using DB2 commands. 


In this section, we use the db2top utility in the performance problem 
determination examples. However, there are other options available using either 
native DB2 snapshots, the db2pd utility, or DB2 9.7 new relational monitoring 
functions. 

For high CPU usage situations, one good approach is to isolate the thread 
consuming the most CPU at the operating system level as discussed in the 
previous section. In this section, we discuss how to further identify detail activity 
of the DB2 thread. 

The db2pd -edus command is useful for this purpose. Example 6-30 shows an 
example of db2pd -edus output from the first data node for logical database 
partition one. 

Example 6-30 db2pd -edus output 
db2pd -edus -dbp 1 

Database Partition 1 — Active — Up 3 days 15:44:58 — Date 10/04/2010 14:30:37 


List of all EDUs for database partition 1 


db2sysc PID: 623 
db2wdog PID: 618 
db2acd PID: 721 


EDU ID TID Kernel TID EDU Name 


(s) SYS(s) 


1318 47785151818048 29586 
1315 47790205954368 29579 
1314 47790176594240 29578 
1308 47790201760064 28469 
1245 47785143429440 27960 
1244 47785164400960 27959 
1243 47785130846528 27958 
1239 47790424058176 27953 
1230 47785218926912 27944 
1228 47785185372480 27943 
1220 47785160206656 27935 
1214 47790264674624 27928 
1213 47790382115136 27927 
1203 47785319590208 27918 
1202 47790298229056 27917 


db2agnta (BCUKIT) 1 
db2agntp (BCUKIT) 1 
db2agntp (BCUKIT) 1 
db2agntdp (BCUKIT ) 1 
db2agnta (BCUKIT) 1 
db2agntp (BCUKIT) 1 
db2agntp (BCUKIT) 1 
db2agntp (BCUKIT) 1 
db2agntp (BCUKIT) 1 
db2agntp (BCUKIT) 1 
db2agntp (BCUKIT) 1 
db2agnta (BCUKIT) 1 
db2agntp (BCUKIT) 1 
db2agntp (BCUKIT) 1 
db2agnta (BCUKIT) 1 


82.300000 9.090000 

0.190000 0.080000 

44.920000 6.040000 

0.000000 0.000000 

833.000000 48.130000 

57.820000 4.200000 

586.960000 33.080000 

6.570000 0.440000 

208.780000 33.750000 

267.160000 36.630000 

11.920000 0.740000 

0.500000 0.210000 

47.070000 2.940000 

2.980000 1.930000 

148.150000 13.540000 
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1201 47785206344000 27913 
1198 47790352755008 27912 
1197 47790411475264 27910 
1194 47790319200576 27903 
1182 47790222731584 27896 
1181 47790235314496 27895 
1174 47790415669568 27889 
1159 47785193761088 27874 
1158 47790260480320 27873 
1157 47790285646144 27872 
1156 47790315006272 27870 
1148 47785235704128 27863 
1147 47785214732608 27862 
1124 47785277647168 26568 
1123 47785265064256 26567 


db2agnta (BCUKIT) 
db2agntp (BCUKIT) 
db2agnta (BCUKIT) 
db2agntp (BCUKIT) 
db2agntp (BCUKIT) 
db2agntp (BCUKIT) 
db2agntp (BCUKIT) 
db2agntp (BCUKIT) 
db2agnta (BCUKIT) 
db2agntp (BCUKIT) 
db2agntp (BCUKIT) 
db2agntp (BCUKIT) 
db2agent (idle) 1 
db2pfchr (BCUKIT) 
db2pfchr (BCUKIT) 


123.120000 17.920000 

6.000000 0.250000 

46.260000 4.460000 

173.240000 7.410000 

167.260000 43.840000 

301.890000 28.780000 

19.950000 2.380000 

62.360000 3.890000 

78.100000 13.630000 

347.670000 44.170000 

18.590000 1.150000 

100.900000 8.310000 

96.790000 5.150000 

9.930000 13.190000 

10.020000 13.210000 


The header includes key information such as the time because the database 

partition has been activated and the db2sysc PID associated to the logical 

database partition. 

The db2pd -edus command provides the following information: 

► EDU ID: DB2 engine dispatchable unit identification (EDU ID) identifies the 
thread from the DB2 perspective, and is useful to match with DB2 outputs 
such as LIST APPLICATIONS, db2diag.log messages, and monitoring 
outputs, as well as running certain DB2 troubleshooting commands. 

► Kernel TID: You can match the kernel TID obtained in the operating system 
level output with this command output to identify a particular DB2 EDU. 

► EDU name: Identifies the thread name. This is useful to understand what the 
thread is used for. The DB2 process model document details the particular 
threads names and their function in the DB2 9.7 Information Center at the 
following link: 

http : //publ i b. boulder. ibm. com/i nfocenter/db21 uw/v9r7/topi c/com. ibm.d 
b2 . 1 uw . admi n . perf . doc/doc/c0008930 . html 

► USR(s) and SYS(s): User and system CPU usage elapsed time in seconds. 
This information can be very useful. Note that the highest elapsed time might 
not necessarily be associated with the thread consuming CPU at the current 
time. For example, a DB2 agent used by an application might have completed 
an expensive SQL increasing its elapsed user CPU usage time, but is not 
consuming the highest CPU at this time. Other threads that might show a high 
elapsed CPU usage might be threads spawned at the instance or database 
activation. For example, db2fcmr and db2fcms threads might report a high 
CPU elapsed time, if the instance has been up for a long time. 
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In order to check for the current CPU consumption, the db2pd -edus with the 
suboption interval can be used. This suboption adds two additional columns 
showing the user and system CPU usage in the interval specified, and orders 
the output by decreasing CPU usage, according to the excerpt shown in 
Example 6-31 . 

Example 6-3 1 db2pd -edus -interval output 


# db2pd -edus interval =5 -dbp 1 


Database Partition 1 — Active -- Up 0 days 00:14:28 — Date 01/06/2011 13:48:58 

List of all EDUs for database partition 1 

db2sysc PID: 757798 
db2wdog PID: 774332 
db2acd PID: 737434 


EDU ID TID Kernel TID EDU Name USR (s) SYS (s) 


3343 3343 1384669 
258 258 729173 
2829 2829 643151 
6427 6427 2089213 
5399 5399 2072821 


db21oggr (BCUDB3) 1 
db2sysc 1 

db2stmm (BCUDB3) 1 
db2fw4 (BCUDB3) 1 
db2fw0 (BCUDB3) 1 


0.818148 0.409035 
1.715241 0.311393 
0.067231 0.039958 
0.026378 0.025864 
0.028868 0.025466 


USR DELTA SYS DELTA 


0.010282 0.005630 
0.006918 0.002972 
0.002075 0.000707 
0.000228 0.000300 
0.000229 0.000284 


db2pd -edus interval =5 (for the past 5 seconds for example) can be used when 
beginning a high CPU usage investigation to check for the top CPU consuming 
DB2 threads. In certain cases, the CPU consumption will stand out. 
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Example 6-32 shows a real example of a DB2 page cleaner EDU ID 4040 that 
has a much higher CPU usage than the rest of the processes on an IBM Smart 
Analytics System 7600. Further verification at the operating system level 
confirms that this process is using 100% of the CPU. In this example, the high 
user CPU elapsed time is caused by the DB2 page cleaner looping. 


Example 6-32 db2pd -edus excerpt 


EDU ID TID Kernel TID EDU Name USR (s) SYS (s) 


8967 8967 1426153 

8710 8710 569875 

7941 7941 3064417 

8686 8686 1528513 

8429 8429 754311 

4297 4297 1983293 

4040 4040 1766045 

3783 3783 787381 

3526 3526 1180441 

3269 3269 1414115 

3012 3012 1618723 

2571 2571 508915 

2314 2314 1073789 

2057 2057 2024273 

1800 1800 1065723 

1543 1543 2753455 

1286 1286 1159695 

1029 1029 836607 

772 772 1143447 

515 515 774701 

22 2 25017 

258 258 733847 


db2agntdp (BCUKIT ) 4 
db2agent (idle) 4 
db2agntp (BCUKIT) 4 
db2agntdp (BCUKIT ) 4 
db2agent (idle) 4 
db2pfchr (BCUKIT) 4 
db2pclnr (BCUKIT) 4 
db2dlock (BCUKIT) 4 
db21 fr (BCUKIT) 4 
db21oggw (BCUKIT) 4 
db21oggr (BCUKIT) 4 
db2resync 4 
db2ipccm 4 
db21 icc 4 
db2pdbc 4 
db2extev 4 
db2fcmr 4 
db2extev 4 
db2fcms 4 
db2thcl n 4 
db2alarm 4 
db2sysc 4 


0.000228 0.000036 

0.008675 0.010229 

0.078646 0.038762 

0.006494 0.003923 

0.029122 0.013138 

40.026308 22.125700 

11530.323230 43.076620 

0.054281 0.010329 

0.000074 0.000009 

1.896206 3.202154 

1.908058 2.395301 

0.000584 0.000768 

0.008313 0.005932 

0.000106 0.000396 

0.041439 0.019909 

0.004683 0.014039 

0.022657 0.013068 

0.000105 0.000015 

0.037230 0.026084 

0.005262 0.001580 

0.054986 0.054972 

0.728096 0.821109 


6.3.1 CPU consumption 

In the following sections we discuss CPU consumption by applications, utilities, 
and other activities. 

Applications consuming CPU 

In this section, as an example, we examine a way of identifying the application 
using the highest amount of CPU, and then use the db2top utility to obtain further 
details about the application using it. Other methods of narrowing down a top 
CPU consuming thread to an application, and the SQL being executed, are 
mentioned at the end of this section. 

We assume a situation where the IBM Smart Analytics System shows a higher 
than usual CPU usage. 
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In 6.2.1, “CPU, run queue, and load average monitoring” on page 137 we have 
reviewed how to identify the process and threads having the highest CPU usage. 
Using the process, we first collect a ps output that shows the thread with the 
highest CPU usage on the first data node. Because all db2sysc processes 
appear to consume equally high CPU, we can just pick one db2sysc process for 
further review. In this example, we get the thread level ps output for db2sysc 
corresponding to database partition one, with PID 623. 

Alternatively, you can also use db2pd -edus interval =5 to identify the top CPU 
consuming thread. 


The ps command output in Example 6-33 shows that thread ID 6644 has the 
highest CPU usage (based on the fifth column). 


Example 6-33 Thread with highest CPU 
ps -Lm -Fp 623 | sort -rn +4 


bculinux 623 618 - 68 115 
bcul inux - - 6644 
bculinux - - 6778 
bculinux - - 6701 
bculinux - - 6793 
bculinux - - 6617 
bculinux - - 6711 
bculinux - - 6677 
bculinux - - 6648 
bculinux - - 11346 
bculinux - - 11345 
bculinux - - 6815 
bculinux - - 6807 
bculinux - - 6739 
[•••] 


i 3553910 6239920 - Sep30 ? 

10 - - - 12 

8 - - - 4 

7 - - - 9 

6 - - - 7 

6 - - - 7 

5 - - - 7 

5 - - - 7 

5 - - - 10 

5 - - - 8 

5 - - - 14 

4 - - - 7 

4 - - - 7 

4 - - - 2 


14:11:15 db2sysc 1 
12:58 - 00:38:18 - 

12:58 - 00:30:49 - 

12:58 - 00:26:49 - 

12:58 - 00:24:14 - 

12:58 - 00:24:38 - 

12:58 - 00:20:18 - 

12:58 - 00:19:55 - 

12:58 - 00:20:29 - 

14:30 - 00:16:05 - 

14:30 - 00:17:05 - 

12:58 - 00:17:20 - 

12:58 - 00:18:46 - 

12:58 - 00:15:37 - 


db2pd -edus -dbp 1 is used on data node 1 to get the details about all the 
threads running on database partition 1. Example 6-34 shows an excerpt of 

db2pd -edus -dbp 1. 
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Example 6-34 db2pd -edus -dbp 1 output from the first data node 


# db2pd -edus -dbp 1 

Database Partition 1 — Active — Up 0 days 20:51:15 — Date 10/01/2010 19:36:54 


List of all EDUs for database partition 1 


db2sysc PID: 623 

db2wdog PID: 618 
db2acd PID: 721 


EDU ID TID Kernel TID EDU Name 


USR (s) 


(s) 


908 47790222731584 11458 

907 47790226925888 11457 

903 47790231120192 11361 

899 47790235314496 11348 

898 47790239508800 11347 

897 47790243703104 11346 

896 47790247897408 11345 

894 47790252091712 11265 

893 47790256286016 11263 

892 47790260480320 11262 

885 47790264674624 7124 

884 47790268868928 7123 

883 47790273063232 7122 

882 47790277257536 7121 

879 47790281451840 6825 

869 47790285646144 6816 

868 47790289840448 6815 

867 47790294034752 6814 

861 47790298229056 6807 

850 47790302423360 6796 

[••■] 


db2agntdp (BCUKIT ) 
db2agntdp (BCUKIT ) 
db2agntdp (BCUKIT ) 
db2agntdp (BCUKIT ) 
db2agntp (BCUKIT) 1 
db2agntdp (BCUKIT ) 
db2agntdp (BCUKIT ) 
db2agntdp (BCUKIT ) 
db2agntdp (BCUKIT ) 
db2agntdp (BCUKIT ) 
db2agntdp (BCUKIT ) 
db2agntdp (BCUKIT ) 
db2agntdp (BCUKIT ) 
db2agntdp (BCUKIT ) 
db2agnta (BCUKIT) 1 
db2agntdp (BCUKIT ) 
db2agntdp (BCUKIT ) 
db2agntdp (BCUKIT ) 
db2agntdp (BCUKIT ) 
db2agnta (BCUKIT) 1 


0.160000 0.020000 

235.070000 30.130000 

47.450000 6.330000 

0.010000 0.010000 

320.480000 17.560000 

844.460000 120.990000 

955.250000 70.430000 

75.910000 9.880000 

0.490000 0.170000 

45.380000 4.990000 

626.010000 77.660000 

90.400000 11.510000 

243.340000 14.670000 

140.020000 17.340000 

465.080000 71.830000 

31.360000 2.190000 

959.880000 81.030000 

550.330000 67.770000 

1015.500000 110.870000 

848.240000 50.070000 


The kernel TID returned in the ps command is in the third column in db2pd -edus 
output. We can use the grep command to filter out the output, as shown in 
Example 6-35. In this example, we are looking for TID 6644. 

Example 6-35 Filtering the output 

db2pd -edus | grep 6644 

698 47785227315520 6644 db2agntp (BCUKIT) 1 2160.030000 160.520000 


In this example, it turns out that the highest CPU consumer is a DB2 subagent 
with EDU ID 698. If a db2 agent or subagent consumes a high amount of user 
CPU, the CPU consumption is generally related to an expensive query. 

The next step consists in narrowing down the SQL statement executed by that 
particular agent, for further investigation. We use db2top to check the application 
consuming the most CPU, and match it back to the thread seen at the DB2 level. 
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Figure 6-3 shows the welcome screen for db2top run from the administration 
node using the following command: 
db2top -d bcukit 

In this case, bcukit represents the database name. 



Licensed Materials - Property of IBM 

Copyright IBM Corp. 2005, 2006 All Rights Reserved. 


Figure 6-3 db2top welcome screen 


Press 1 to go to the Sessions screen, which lists all the applications. Press z to 
sort the columns in ascending order. You are looking for the application 
consuming the most CPU, so sort per column Cpu% total, which is column one 
(column ordering starts at zero). Enter 1 when prompted for the “Column number 
for descending sort.” 
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For a screen that shows all columns ordered by percent total CPU consumption, 
see Figure 6-4. In our example, the application handle 456 consumes the most 
CPU. 



In order to see details of the application execution, press a (for agent ID). Enter 
456 when prompted with “Please enter agent id:.” You get further details about 
the SQL statement being executed by the agent. 
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Figure 6-5 shows further details on the actual application running the SQL 
statement, including further information about the application (Application name, 
client PID, DB2 user), and various statistics such as cpu elapsed time, sorts. 



At this screen, we can get further details about the various agents associated to 
this application by pressing d. 
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Figure 6-6 shows all the agents associated to the application handle 456 on all 
the nodes. You can see that the agent TID 698 is associated to the application on 
database partition 1 . So, we have matched the thread with the highest CPU 
usage with an application, and the SQL statement being executed by the 
application. 



Figure 6-6 db2top associated agents screen 


You can generate detailed query optimizer information, including an access plan, 
using the db2exfmt explain tool if you press x. Further information about the 
db2exfmt tool can be found at the following link: 

http : //publ ib.boulder.ibm.com/infocenter/db21uw/v9r7/topic/com.ibm.db2. 
1 uw . admi n . cmd . doc/doc/r0002353 . html 
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Figure 6-7 shows the db2exfmt output, which is edited using the vi editor. At the 
bottom of the file, you can see that the file is saved under /tmp/explain.<nnnn>. 
Press :wq to exit the vi editor and save the file. 

Press r to return to the previous Sessions screen. 



If you want to further identify the application running the culprit SQL statement, 
you can press S and capture a global application snapshot. 
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Figure 6-8 shows the global application snapshot containing all the information 
necessary to isolate the application submitting this SQL, which includes the 
application name, the inbound IP address, and the connect Authorization ID. 



An alternate way is to look directly for top consuming SQL. From the welcome 
screen, you can press D for Dynamic SQL. You will see all the statements 
executed on the system. Press z to sort per descending column order, followed 
by the column number five for CPU time. 
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Figure 6-9 shows the SQL screen. After sorting, we recognize the top consuming 
SQL, in terms of CPU time in the top row. 


[/] 15:00:04, refresh=2secs(0. 008) 

Enter SQL hash string: 0000001196782700891535048211 

HashValue Statement (30 first char.) 



00000014625833282084814102 

00000016826973971099182528 

00000012478331566011111355 

00000016576471327228927760 

00000002049266262197181246 

00000016559989360269107959 

00000014934050767620923078 

00000001857365641925831948 

00000010113130495587294550 

00000001388160218809444373 

00000017943188655970957270 

00000007714669665519719484 

00000011058504738675622298 

00000000864765591625414478 

00000009668008295969272269 

00000009452366330990901415 

00000010567183381451522844 

00000007975626770754306051 

00000011537732487362555236 

00000008381702381420925853 

00000004265983134564085415 

00000004354003115962724009 

00000002519323609090988725 



04031 

99899 

90126 

51603 

79993 

17027 

45332 

33413 

040719 

542327 

007424 

007112 

006123 

006107 

005405 


Figure 6-9 db2stop SQL screen shot 


653.891323 2072.73764 230 
1351.37508 
707.934420 
14.56383 
1.924373 


18 1615.32700 179 
!0 1590.91742 176 
13 1497.20398 166 
19.856837 16 


I 16 
1370.77556 152 
120.715854 
0.399775 
0.290038 
0.251526 
0.227180 
0.216970 
0.140141 
0.103814 
0.040706 
0.014197 
0.007419 
0.007107 
0.006118 
0.006102 
0.005379 
0.003615 
0.001870 


48077 

76860 

35599 

65075 

30839 

41287 

04441 

03222 

02794 

02524 

02410 

015571 

010381 

004525 

00157 

00082 

00078 

000675 

000678 

000597 

000180 

001870 


2979845723 

3785800744 

2096327612 

3161680489 

138,829,141 

138,825,035 


You can then press L to obtain further details about the SQL statement. As 
shown in Figure 6-9, db2top prompts for the SQL hash string shawn, which is 
located on the first column. 

You get the actual statement as shown in Figure 6-10 after you enter the SQL 
hash string. Press x to get its access plan for further review. 

If the application running the SQL statement is still running, you can use the 
Sessions screen to identify the top CPU consuming application, and get further 
details about the application. 
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In the previous example, we used the db2top utility to map the top consuming 
thread to an application and the SQL being executed. Other approaches include 
the use of the db2pd utility, and new relational monitoring functions: 

► db2pd utility: 

- After you have used the db2pd -edus command to identify that the EDU ID 
consuming the most CPU is an agent thread, you can run the following 
command on the first data node to determine the application handle 
corresponding to the agent EDU ID 6644 identified: 

db2pd -dbp 1 -agent 6644 

- This gives details of the application using this agent, including the 
application handle 456. You can use the following db2pd command to 
obtain further details about the application handle 456, including the SQL 
being executed: 

db2pd -dbp 1 -apinfo 456 all 
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► DB2 9.7 relational monitoring interface: 

- You also can match the agent EDU ID from the db2pd -edus output to the 
AGENT_TID column in the WLM_GET_SERVICE_CLASS_AGENTS_V97 
table function output to determine details about the application, including 
the application handle, and, if relevant, what request and statement the 
thread is working on. You can use the EXECUTABLEJD returned by 
WLM_G ET_S E RV I C E_C L ASS_AG E NTS_V 97 to generate the actual 
access plan of the statement being executed by the agent, as described in 
the DB2 9.7 Information Center: 

http://publib.boulder.ibm.com/infocenter/db21uw/v9r7/topic/com.ib 
m. db2 . 1 uw. sql . rtn . doc/doc/r0056251 . html 


Utilities consuming CPU 

The same method applies to identify the threads consuming CPU at the 
beginning of the investigation. Example 6-36 shows the ps command output and 
the db2pd command to check for the threads consuming the most CPU in this 
case. 


Example 6-36 Identify the thread consuming the most CPU 


- 75 126 3533942 6161004 - Sep30 

22326 20 

22324 15 
22323 15 
22322 15 
22321 15 


# db2pd -edus | grep 2232 
973 47790205954368 22329 

972 47790214342976 22328 

971 47790218537280 22327 

970 47790176594240 22326 

969 47790189177152 22325 

968 47790193371456 22324 

967 47790197565760 22323 

966 47790180788544 22322 

965 47790184982848 22321 


db21 bm2 1 
db21 bml 1 
db21 bmO 1 
db21rid 1 
db21mr 1 


16:15:17 db2sysc 1 
00:01:02 - 
00:00:47 - 
00:00:47 - 
00:00:47 - 
00:00:47 - 
00:51:56 - 
00:34:40 - 
00:26:49 - 
00:27:06 - 
00:24:14 - 
00:23:47 - 
00:22:26 - 
00:24:38 - 
00:18:46 - 


0.040000 

0.050000 

0.030000 

78.300000 

5.700000 

58.430000 

58.330000 

58.590000 

58.130000 


0.080000 

0.110000 

0.100000 

2.260000 

9.030000 
2.860000 
3.090000 

3.030000 
3.000000 


In this example, we can see that the top consuming thread is the db21rid 
process with EDU thread ID 970, as well as certain db21rfrmX threads. These 
threads are related to the LOAD utility. The db21 frmX threads format the records 
from the flat file into an internal record format. 
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Because the high CPU usage is related to a utility, you can press u to get details 
about utilities running on the system. 

Figure 6-1 1 shows that there is LOAD utility running on the system. 



Figure 6-11 db2top Utilities screen 


You can get further details about the utility by running the command LIST 
UTILITIES SHOW DETAIL. Example 6-37 shows an excerpt of the command 
output, ran from the catalog partition, which lists the status of the LOAD 
command on all the partitions. 

Example 6-37 DB2 list utilities show detail output 


# db2 list utilities show detail 

list utilities show detail 


ID 

Type 

Database Name 
Partition Number 
Description 



Invocation Type 
Progress Monitoring: 
Phase Number 
Description 
Total Work 
Completed Work 
Start Time 


= LOAD 
= BCUKIT 

= OFFLINE LOAD CURSOR AUTOMATIC INDEXING 
REPLACE NON-RECOVERABLE TPCD . PARTSKW 
= 10/01/2010 18:27:48.033098 
= Executing 


SETUP 


= 0 bytes 

= 10/01/2010 18:27:48.033108 


Phase Number 
Description 
Total Work 
Completed Work 
Start Time 


LOAD 

40006935 rows 
40006935 rows 

10/01/2010 18:27:52.134872 


Phase Number [Current] 
Description 
Total Work 
Completed Work 
Start Time 


= BUILD 
= 1 indexes 

= 10/01/2010 18:34:35.651875 


ID 

Type 

Database Name 
Partition Number 
Description 


LOAD 

BCUKIT 

2 

OFFLINE LOAD CURSOR AUTOMATIC INDEXING 
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Start Time 
State 

Invocation Type 


REPLACE NON-RECOVERABLE TPCD . PARTSKW 
10/01/2010 18:27:48.334842 


Phase Number 
Description 
Total Work 
Completed Work 


= SETUP 
bytes 
bytes 

= 10/01/2010 18:27:48.334851 


Phase Number 
Description 
Total Work 
Completed Work 


2 

LOAD 

39997915 rows 
39997915 rows 

10/01/2010 18:27:51.835023 


Phase Number [Curr 
Description 
Total Work 
Completed Work 

[...] 


nt] 


= BUILD 
= 1 indexes 
= 1 indexes 
= 10/01/2010 18:34:1 


1.547275 


After you have identified the LOAD job, you have various options to minimize its 
impact on the system including: 

► Adjust the number of db21 frmX processes on each database partition by 
setting CPU_PARALLELISM option. In this particular example, the 5600 VI 
has 1 6 logical CPUs per data node. Each data node has four logical database 
partitions. So, DB2 spawns a total of 1 6 db21 frmX, with 4 db21 frm per logical 
partition. You can reduce this number to get a smaller number of threads per 
partition. 

► Reduce the priority of LOAD using DB2 workload manager if implemented. 

Other DB2 utilities such as BACKUP or RUNSTATS consuming an excessive 
amount of CPU, can be throttled dynamically using the command SET 
UTIL_IMPACT_PRIORITY. This command is documented in the DB2 9.7 
Information Center: 

http://publib.boulder.ibm.com/infocenter/db21uw/v9r7/topic/com.ibm.db2. 
1 uw . admi n . cmd . doc/doc/rOOl 1773 . html 

Note that the same command as shown previously can be used to limit the 
impact of ASYNCHRONOUS INDEX CLEANUP following an ALTER TABLE... 
DETACH PARTITION statement, for range partitioned tables containing global 
indexes. By default, this utility has an impact limited to 50, but can be reduced in 
case it is still disruptive to the production system. Consult the following link for 
further information about ASYNCHRONOUS INDEX CLEANUP: 
http://publib.boulder.ibm.com/infocenter/db21uw/v9r7/topic/com.ibm.db2. 
1 uw. admi n.perf .doc/doc/c002 1597. html 
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Instead of setting the priority at each individual utility level, all utilities running 
within the instance can be throttled using the database manager configuration 
parameter UTILJMPACTJJM: 

http://publib.boulder.ibm.com/infocenter/db21uw/v9r7/topic/com.ibm.db2. 
1 uw. admi n . conf i g .doc/doc/r0010968. html 

Other activities consuming CPU 

Most high CPU situations begin with identifying the top three or five threads 
consuming CPU on the system. There are situations where the thread 
consuming CPU is not directly associated to an application or a utility running on 
a system. This is the case for threads belonging to the DB2 engine, which 
performs tasks for the entire workload running on the system. 

In this case, you need to review thoroughly the entire workload running on the 
system to see if anything unusual is running on the system. As shown in 
Example 6-32, you can check with the db2pd -edus -alldbp command to see if 
there is anything unusual that stands out on the system, and verify at the OS 
level that the thread is still consuming a high amount of CPU. You can then use 
tools such as db2top or the LIST UTILITY SHOW DETAIL statement to determine 
if the thread consuming CPU can be associated to a particular activity. 

For example, if the db21oggw thread (which writes transaction log records) is 
associated to an unusually high CPU consumption, you can review all transaction 
intensive applications running on your system such as a large INSERT or 
DELETE SQL statement. 

6.3.2 I/O usage 

In the previous section, we reviewed how to match a physical device with a DB2 
file system. Recent IBM Smart Analytics System offerings using automatic 
storage have one file system per device dedicated to one logical database 
partition. If the high I/O activity is isolated to a single device, the mapping 
discussed in 6.2.2, “Disk I/O and block queue” on page 142 can help you identify 
the database partition associated with the high I/O activity. 

It is important to isolate the scope of the high I/O activity. For the IBM Smart 
Analytics System, during a high I/O activity, in most cases, all the file systems 
associated with the automatic storage path will show that all the file systems 
have an equally high I/O activity. If the high I/O is isolated to a particular file 
system and to a particular group of file systems, it might be caused as follows: 

► Data skew: Certain database partitions might always lag behind during query 
execution, or show a higher than average CPU and I/O usage than the rest of 
the database partitions. This specific scenario is discussed in 6.4, “Common 
scenario: Data skew” on page 195. 
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► Database partition group: The high I/O usage on certain file systems can 
correspond to a specific database partition group, on which you have a 
specific workload or utility running. 

► Hardware issues: If all the file systems reside on the same external storage, 
the storage can be running with performance degraded, as a result of 
maintenance being run, or other hardware issues. You can check the IBM 
Storage Manager Console to see if there are any error messages related to 
that specific storage. 

In this section, we discuss how to narrow down a high I/O activity seen on the 
operating system level down to the database object, and the application 
consuming I/O. 

Application using high I/O 

In the test case, the I/O usage reported appears higher than usual. The goal is to 
identify if there is anything unusual running on this system. A single SQL 
statement cannot be singled out, as being the culprit for an I/O saturation. 
Instead, it is generally a combination of concurrent workload causing the issue. 


In order to better understand the workload, we can look at the number of 
applications connected and the queries being executed to see if anything looks 
unusual. To better understand where the most I/O is being done and the 
associated database objects (tables), we can identify the top three SQL 
statements, or database objects which are the most frequently accessed. 

In Example 6-38, vmstat reports I/O wait time up to 76%, with a high number of 
threads in the block queue (b column). The system is I/O bound. 


Example 6-38 vmstat output on I/O bound system 


r b swpd free buff cache si so 

1 63 0 17360300 1788424 42525572 0 

3 63 0 17360564 1788428 42525568 0 

4 57 0 17361344 1788432 42525564 0 

6 64 0 17360656 1788444 42525552 0 

0 69 0 17362144 1788444 42525552 0 

2 61 0 17362516 1788444 42525552 0 

6 47 0 17363320 1788448 42525548 0 

3 51 0 17363700 1788448 42525548 0 

3 56 0 17363416 1788456 42525540 0 

5 53 0 17364028 1788456 42525540 0 

4 57 0 17364160 1788456 42525540 0 

1 62 0 17364040 1788468 42525528 0 

4 60 0 17364672 1788468 42525528 0 

[...] 


io system cpu 

bi bo in cs us sy id wa st 
0 1135 241 0 0 2 0 95 2 0 

0 840760 35994 27857 129297 21 9 2 69 

0 680720 20620 26648 125769 17 8 1 74 

0 693600 22538 27904 129783 16 8 2 74 

0 698272 22284 28463 132483 16 9 2 73 

0 741072 19504 27668 128692 16 8 3 73 

0 683792 2572 27666 127690 18 8 2 72 

0 852912 16406 31498 140055 21 9 3 67 

0 610752 26616 23410 116020 15 7 2 76 

0 805120 13354 25128 116890 20 8 3 69 

0 666080 26168 24629 119777 16 8 6 70 

0 665816 24212 26898 127206 16 8 2 74 

0 610168 27854 24761 120212 15 7 2 76 


0 

0 

0 

0 

0 

0 

0 


0 
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Example 6-39 shows an iostat command excerpt with mostly read activity with 
block reads in the range of 457K to 928K versus 1 1 K to 22K for blocks written in 
a 2-second interval. 


Example 6-39 iostat command output sample 
# iostat 2 

Linux 2.6.16.60-0.21-smp (ISAS56R1D2) 10/04/2010 


avg-cpu: %user %nice %system %iowait %steal %idle 

39.46 0.00 10.17 47.04 0.00 3.34 


sda 

sdb 


sde 


dm- 2 

dm- 4 
dm-5 


dm- 7 
sdf 


tps 

3.98 

4877.61 

5712.94 
4054.23 

1712.94 
1819.40 
4155.72 
5831.34 
4949.75 

4.48 

0.00 

2.49 
0.00 
0.00 


B1 k_read/s B1 kjwrtn/s 

0.00 123.38 

227836.82 5492.54 

461882.59 8007.96 

372155.22 10969.15 

303653.73 5504.48 

303601.99 5500.50 

372187.06 10969.15 

461611.94 8007.96 

227231.84 5078.61 

0.00 35.82 

0.00 0.00 

0.00 19.90 

0.00 0.00 

0.00 0.00 


B1 kread B1 kwrtn 
0 248 

457952 11040 

928384 16096 

748032 22048 

610344 11064 

610240 11056 

748096 22048 

927840 16096 

456736 10208 

0 72 


avg-cpu: %user 

41.47 


%nice %system %iowait %steal %idle 

0.00 10.31 44.47 0.00 3.75 




sdc 

sdd 




dm- 3 
dm- 4 


dm- 7 
sdf 


tps 

0.00 

5333.50 

5381.50 

3464.00 

1910.50 

1981.50 

3523.50 

5454.00 

5535.50 
0.00 
0.00 
0.00 
0.00 
0.00 


B1 k_read/s B1 k_wrtn/s B1 kread 


0.00 0.00 0 

268688.00 13716.00 537376 

474768.00 656.00 949536 

477328.00 56.00 954656 

342320.00 6624.00 684640 

342224.00 6624.00 684448 

477264.00 52.00 954528 

474800.00 1008.00 949600 

269776.00 14200.00 539552 

0.00 0.00 0 

0.00 0.00 0 


B1 kwrtn 


27432 

1312 

112 

13248 

13248 

104 

2016 

28400 


Based this output, we are looking for applications driving a high read activity on 
the system. db2top can be used for that purpose. The method is fairly identical to 
the one used to identify the application consuming the highest CPU. db2top is 
started from the administration node using the following command: 
db2top -d bcukit 
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On the welcome screen, press 1 to get to the Sessions screen. Press z to order 
the columns per the third column, which is 1/0% total, and enter 2 when 
prompted for the column number for descending sort. Figure 6-12 shows the 
db2top screen. We can see that the application doing the most I/O is application 
handle 1497. 



We press a to get further details about the SQL statement ran by this 
application, and enter 1497 when prompted for the agent ID. 
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Figure 6-13 shows the details of the application. We notice the very large number 
of rows read by this application. It has read more than 25 billion rows. We can 
generate a db2exfmt output to see if the plan of the query has changed. 

We can also verify if the table where most of the I/O occurs is used in the current 
SQL. In order to do this, the actual SQL statement gives all the tables in the 
FROM clause of the query. 



We can press r to get back to the Sessions screen, and press T to identify the 
tables statistics, where most of the I/O is being done. Because the iostat 
showed that the nature of the I/O workload was mostly read (according to 
Example 6-39 on page 171), we can sort the columns by the number of rows 
read per second, which is the second column, by pressing z followed by 1. 

Figure 6-14 shows that the table on which the most I/O is being done is 
TPCD.LINEITEM, which is the largest table used in the previous query. 

Note that depending on the case, we might have started the investigation by 
looking at the most frequently accessed tables. 
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Figure 6-14 db2top Tables screen shot 


The same exercise can be repeated to narrow down other SQLs and applications 
driving the most I/O, to further understand the workload at the DB2 level causing 
all the I/O activity. 

DB2 I/O metrics 

In this section, we discuss a few DB2 metrics that can help you to get an overall 
picture of the I/O activity within your database, more specifically the objects 
where the I/O is being done. These metrics can help you in establishing a 
baseline to match a given I/O usage at the operating system level to a DB2 level 
of activity. This baseline is helpful to investigate issues with I/O bottlenecks when 
they arise, in order to understand if the I/O subsystem issue is caused by a 
particular level of DB2 workload. 

These metrics can also help you to make decisions in prioritizing maintenance 
tasks such as table reorganization or RUNSTATS to achieve optimal I/O 
utilization. 
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DB2 buffer pool hit ratio 

By default, the IBM Smart Analytics System uses a single unified buffer pool. 
Details of the IBM Smart Analytics System buffer pool design are discussed in 
Chapter 7, “Advanced configuration and tuning” on page 203. All relevant DB2 
buffer pool metrics are related to the single BP16K buffer pool. 

A good metric for buffer pool monitoring is the overall buffer pool hit ratio. As 
shown in Figure 6-15, db2top can be used to show a high level buffer pool ratio; 
you can get to the Bufferpool metrics by pressing b. 



The DB2 9.7 relational monitoring function MON_GET_BUFFERPOOL provides 
detailed information about the buffer pool activity, including the buffer pool hit 
ratio. Example 6-40 shows an example of an SQL query returning the overall 
buffer pool hit ratio. 

Example 6-40 Overall buffer pool hit ratio 

WITH BPMETRICS AS 

( SELECT bp_name, pool_data_l_reads + pool_temp_data_l_reads 
+ pool_index_l_reads + pool_temp_index_l_reads + pool_xda_l_reads 
+ pool_temp_xda_l_reads as logical_reads, pool_data_p_reads 
+ pool_temp_data_p_reads + pool_index_p_reads + pool_temp_index_p_reads 
+ pool_xda_p_reads + pool_temp_xda_p_reads as physical_reads, 
pool_read_time, member 

FROM TABLE (M0N_GET_BUFFERP00L(' ' ,-2)) AS METRICS) 

SELECT MEMBER, VARCHAR(bp_name,20) AS bp_name, 
logical_reads, physical_reads, pool_read_time, 

CASE 

WHEN logical_reads > 0 

THEN DEC ( (1 - (FLOAT(physical_reads) / FLOAT(logical_reads))) * 100,5,2) 

ELSE NULL END 
AS HIT_RATI0 
FROM BPMETRICS 

WHERE BP_NAME not like ' I BM% 1 
ORDER BY MEMBER, BP_NAME 

MEMBER BP_NAME L0GICAL_READS PHYSICAL_READS P00L_READ_TIME HIT_RATI0 


0 BP16K 2684 89 31 96.68 

1 BP16K 219495951 58457595 270435893 73.36 


Chapter 6. Performance troubleshooting 175 


223792271 

223496129 

221308949 

221315888 

220618242 

220317104 

216830746 


63126325 224470159 71.79 
65649024 184997732 70.62 
63640936 169667848 71.24 
63604146 195739067 71.26 
58597348 263506121 73.43 
61109821 243800130 72.26 
58440331 266063848 73.04 


In a data warehousing environment, the buffer pool ratio can be very low due to 
the relatively large size of the table scans compared to the buffer pool size. 

However, the previous result gives a good baseline on the amount of physical 
versus logical reads taking place on the database, for tracking purposes, and 
establishing a baseline. This can be useful to identify if an I/O bound system is 
the result of an increased number of physical reads, for example. If not, and the 
system is experiencing a high I/O wait, there might be other issues within the I/O 
subsystem where the I/O runs in degraded mode (due to hardware issues, for 
example). 

Rows read per row returned 

A metric that can help measure the efficiency of the I/O made by the application 
is the ratio of the rows read per rows returned. A high ratio might be an indication 
of poor access plan choices. Regular collection of this data can also help in 
establishing a baseline for your application I/O consumption pattern. A sudden 
degradation can be attributed to poor access plans. 

The DB2 administrative view MON_CONNECTION_SUMMARY offers this 
metric, with the column ROWS_READ_PER_ROWS_RETURNED. Note that 
other relevant data related to the I/O returned by this administrative view include 
IO_WAIT_TIME_PERCENT and TOTAL_BP_HIT_RATIO_PERCENT. 

“Hot” tables 

In order to get a quick idea of the regular tables where most of the I/O is being 
done within your database, we can run an SQL query based on DB2 9.7 
relational monitoring function MON_GET_TABLE, as shown in Example 6-41 . 
We can see that the table TPCD.LINEITEM is the far most accessed table within 
this database. From a database administration perspective, we must ensure that 
the table is well maintained by running the REORGCHK_TB_STATS procedure, 
for example, to make sure that the table does not need a reorganization. 

In certain production databases, many unused tables can accumulate. This can 
impact the performance of utilities such as BACKUP, and affect data placement 
within the table space. Note that these statistics apply since the last time the 
database was activated. So, it might be normal to see certain tables not having 
any usage yet. If the database has been active for a long time where all the 
workload on your system has gone through an entire cycle, and there are still a 
few unused tables, check with your application team if they can be dropped. 
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Generally, the preferred method is to rename the tables first for an entire 
workload cycle after consulting with the application team. This action helps 
ensure that no applications receive errors because they try to access the tables. 
After it has been verified, the tables can be dropped. See Example 6-41 . 


Example 6-4 1 “Hot" tables 


SELECT SUBSTR(TABSCHEMA, 1 , 10) as TABSCHEMA, 

SUBSTR(TABNAME, 1 , 15) as TABNAME, SUM(TABLESCANS) AS SUM_TSCANS, 

SUM(R0WS_READ) AS SUM_RW_RD, SUM(ROWS_INSERTED) AS SUMRWINS, 

SUM(ROWS_UPDATED) AS SUM_RW_UPD, SUM(ROWS_DELETED) AS SUM_RW_DEL 
FROM TABLE (M0N_GET_TABLE(' ' , " ,-2)) 

WHERE TAB_TYPE= 1 USERTABLE 1 

GROUP BY TABSCHEMA, TABNAME ORDER BY SUM_TSCANS DESC 

TABSCHEMA TABNAME SUM_TSCANS SUM_RW_RD SUM_RW_INS SUMRWJJPD SUM_RW_DEL 


TPCD LINEITEM 

TPCD ORDERS 

TPCD PART 

TPCD PARTSUPP 

TPCD SUPPLIER 

TPCD CUSTOMER 

BCULINUX WLM_EVENT 

BCULINUX WLM_EVENT_CONTR 
BCULINUX WLM_EVENT_STMT 
BCULINUX WLM_EVENT_VALS 
BCULINUX WLM_STATS_CONTR 
BCULINUX WLM_STATSJ1IST 
BCULINUX WLM_STATS_Q 

BCULINUX WLM_STATS_SC 

BCULINUX WLM_STATS_WC 

BCULINUX WLM_STATS~WL 

BCULINUX WLMTHRESHCONT 
BCULINUX WLM_THRESH_VIOL 
SYST00LS 0PT_PR0FILE 

TPCD NATION 

TPCD REGION 


15963 144456035062 

6034 13649670205 

328 8115490496 

176 17103799091 

88 125316398 

64 1319998703 


0 925 

0 25 


0 


0 

215 

18 

215 

18 


0 

0 

18 

0 

0 

0 


0 


0 


0 


0 


0 

0 

0 

0 

0 

0 


21 record (s) selected. 


Index usage 

A query similar to those shown previously can be run to identify most used 
indexes, as well as unused ones, as shown in Example 6-42. 

We can see that the most frequently accessed index is on TPCD.SUPPLIER. 
Also, you can see that there are indexes not being used at all. In this case, you 
can further check if the tables have appropriate indexes. In Example 6-41 , we 
saw that TPCD.LINEITEM table had the most table scans, but not a single index 
access. In this case, we can run the DB2 Design advisor db2advi s utility to check 
if there are suggestions for a better index. There might be none. db2advis is 
documented at the following link: 

http://publib.boulder.ibm.com/infocenter/db21uw/v9r7/topic/com.ibm.db2. 
1 uw . admi n . cmd . doc/doc/r0002452 . html 
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Example 6-42 Index usage 


SELECT SUBSTRfS . INDSCHEMA, 1 , 10) AS INDSCHEHA, 

SUBSTR(S.INDNAME,1,15) AS INDNAME, 

SUBSTR(S.TABNAME,1,15) AS TABNAME, 

SUM(T. INDEX_SCANS) AS SUM_IX_SCANS, 

SUM(T. INDEX_ONLY_SCANS) AS SUM_IX_ONLY_SCANS 
FROM TABLE (MON_GET_INDEX( " , " , -2)) as T, 

SYSCAT. INDEXES AS S WHERE T.TABSCHEMA = S.TABSCHEMA 
AND T. TABNAME = S. TABNAME AND T.IID = S.IID 
AND T.TABSCHEMA not like ' SYS%' 

GROUP BY S. INDSCHEMA, S. INDNAME, S. TABNAME 
ORDER BY SUM IX SCANS DESC 

INDSCHEMA INDNAME TABNAME SUM_IX_SCANS SUM_IX_ONLY_SCANS 


TPCD S_NK SUPPLIER 

TPCD N_NK NATION 

TPCD C_NK CUSTOMER 

TPCD PS_PKSK PARTSUPP 

TPCD C_CK CUSTOMER 

TPCD 0_0K ORDERS 

TPCD PS_PK PARTSUPP 

TPCD S_SK SUPPLIER 

TPCD N_RK NATION 

TPCD R_RK REGION 

TPCD L OKLN LINEITEM 

TPCD 0_CK ORDERS 

TPCD PS_SK PARTSUPP 

TPCD PS_SKPK PARTSUPP 

TPCD P_PK PART 


184 

33 

32 

16 


0 

0 


15 record (s) selected. 


Prefetching and page cleaning 

IBM Smart Analytics System prefetch related settings are discussed in 7.1.2, 

“DB2 configuration” on page 217. The SYSIBMADM.MON_BP_UTILIZATION 

administrative view provides relevant metrics, including these: 

► The prefetch ratio represents the ratio of physical reads that were done using 
asynchronous I/O (prefetching). 

► The unread prefetch metric represents the ratio of pages that were retrieved 
through prefetching, but were not used by the buffer pool. 

► Percentage of synchronous writes represents the ratio of synchronous writes 
needed to be performed by agents to get more space to accommodate their 
own pages into the buffer pool. This number is generally very low when page 
cleaning is efficient. See further information about the related parameter 
CHNGPGS_THRESH set lower on the IBM Smart Analytics System 7700 
environment in “Database configuration settings” on page 228. 

Example 6-43 shows a usage example of the administrative view, 

SYSIBMADM.MON_BP_UTILIZATION. 
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Example 6-43 Index hit ration and prefetch ration from 


SELECT SUBSTR(BP_NAME,1,10) AS BPJIAME, 

MEMBER, DATAHITRATIOPERCENT AS DATA_HIT_RATIO, 
INDEX_HIT_RATIO_PERCENT AS INDEX_HIT_RATIO, 
PREFETCH_RATIO_PERCENT AS PREFETCHRATIO, 
ASYNC_NOT_READ_PERCENT AS UNRDPREFETCHRATIO, 
YNC_WRITES_PERCENT AS SYNC_WRITES_RATIO 
FROM SYSIBMADM. M0N_BP_UT ILIZATION 
WHERE BP_NAME not like 'IBM%' 

ORDER BY BP_NAME, MEMBER 


BPNAME MEMBER DATA_HIT_RATIO INDEX_HIT_RATIO PREFETCHRATIO UNRD_PREFETCH_RATIO SYNC_WRITES_RATIO 


BP16K 0 99.29 
BP16K 1 60.94 
BP16K 2 53.36 
BP16K 3 51.88 
BP16K 4 52.49 
BP16K 5 52.68 
BP16K 6 59.24 
BP16K 7 57.66 
BP16K 8 58.47 


0.00 

40.50 

68.60 

78.84 

71.64 
70.34 
46.99 
52.37 

48.64 


4.05 0.47 
1.60 0.41 
0.94 0.41 
1.34 0.44 
1.20 0.43 
3.28 0.46 
2.41 0.47 
2.92 0.46 


9 record(s) selected. 


Temporary table space I/O 

The I/O usage of the temporary table space is a key metric to monitor and keep 
track of the temporary spill usage within your database. For an IBM Smart 
Analytics System that has SSD devices such as the IBM Smart Analytics System 
5600 with the SSD option and the IBM Smart Analytics System 7700, it is 
essential to understand the system temporary table space usage of your 
database. 

Consumers of the temporary table space include these possibilities: 

► Sort spills: Sorts can be spilled to temporary table space during query 
processing. Sorts can also spill during index creation. These sorts spills can 
be monitored through the sort overflow metrics. 

► Hash join spills: These spills occur when hash join data exceeds sortheap. 
These spills can be monitored through the hash join overflow metric. 

► Optimizer TEMP operations: The optimizer Low level plan operator during 
query processing. The query access plan shows the TEMP operator, along 
with an estimated size of the spill. Cardinality under estimations on the 
access plan can cause a high temporary table space usage. 

► Table queue spills: When the receiver end cannot receive table queue buffers 
fast enough, the table queue buffers are spilled to the temporary table space. 
The table queue spills can be monitored through the application snapshot 
metric. 

► Utility usage: Utilities such as LOAD, REORG, or REDISTRIBUTE can use 
the temporary table space. 
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The db2top tool can be used to track the top five temporary table space 
consumers. From the welcome screen, enter T to get to the db2top Tables 
screen, as shown in Figure 6-16. 



Enter L to get the five top applications consuming the temporary table space. 
Figure 6-17 shows that application handle 125 is the top application consuming 
the temporary table space. 
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To find out more details about the application consuming temporary table space, 
we press any key to get back to the Tables screen, followed by a. When prompted 
for the agent ID, we enter 132. We can get further details about the number of 
sorts and hash join overflows. See Figure 6-18. For this particular query, it 
appears that it is not the problem. We do see that there are table queue spills, 
which are likely the source of the temporary table space usage. We can also get 
details about the SQL statement being executed. So, we can get an explanation 
in the output. 



Here are relevant metrics to get an idea of the temporary table space usage: 

► Sort, hash join, and table queue overflows made by the various applications: 
We have shown how to monitor these metrics at the application level based 
on the application temporary table space usage. Further examples of tracking 
these overflows are discussed in 7.1.2, “DB2 configuration” on page 217. 

► Maximum high water mark usage: 

The maximum high water mark table space usage represents the highest 
water mark in table space usage reached since the database was activated. 
This metric allows you to check if your temporary table space is of sufficient 
size. Specifically, on the IBM Smart Analytics System 5600 and 7700, 
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this metric allows you to verify if the spills are contained within the SSD 
container. 7.1.2, “DB2 configuration” on page 217 has details on how to verify 
the temporary table space usage. 

► Buffer pool temporary hit ratio: 

The Table screen in db2top (Figure 6-16 on page 180) shows detailed 
information about the temporary table accesses in terms of rows read and 
written; the rows can be sorted by Rows read or Rows written per second. 
You can also use relational monitoring function MON_GET_BUFFERPOOL to 
get details on the buffer pool hit ratio for temporary table space, as shown in 
Example 6-44. This can give you an idea of the amount of physical reads 
versus logical reads for temporary data. 

Example 6-44 Temporary buffer pool hit ratio 


WITH BPMETRICS AS ( SELECT bpjiame, pool jlataJ j-eads 
+ poo1_temp_data_l_reads + poo1_index_l_reads 
+ pool_temp_index_l_reads + pooT_xda_T_reads 
+ pool_temp_xda_l_reads as logical_reads, 
pool_data_p_reads + pool_temp_data_p_reads 
+ pool_index_p_reads + pool _tempj ndexj) j-eads 
+ pool_xda_p_reads + pool_temp_xda_p_reads as physical j-eads, 
pool j-ead_time, member 

FROM TABLE(M0N_GET_BUFFERP00L(",-2)) AS METRICS) 

SELECT MEMBER, VARCHAR(bp_name,20) AS bpjiame, 
logical j-eads, physical j-eads, pool_read_time, 

CASE 

WHEN logical j-eads > 0 

THEN D EC ( ( 1 - (FLOAT (physical reads) / FLOAT(logical_reads))) * 100,5,2) 

ELSE NULL 

END AS HIT RATIO 

FROM BPMETRICS 

WHERE BP_NAME not like 1 IBM%' ORDER BY MEMBER, BP_NAME 


MEMBER BP_NAME 


PHYSICAL_READS POOLREADTIME 


1 BP16K 

2 BP16K 

3 BP16K 


18572 

337839353 

32576653 

323382702 

330858029 

333520520 

336701269 

338874468 

338250903 


84574232 

95117898 

94027348 

95877816 

96618758 


370561738 

271783119 

202064324 

247567098 

257182659 

347805047 

324759633 

335470505 


Relational monitoring function MON_GET_TABLESPACE can be used to 
track the amount of physical writes made to the temporary table space as 
well, according to the query shown in Example 6-45. 


Example 6-45 Tracking physical writes on the temporary table space 

SELECT SUBSTR(TBSP_NAME, 1 , 10) AS TBSPJWIE, TBSP_ID, 
SUM(P00L_TEMP_DATA_L_READS + P00L_TEMP_XDA_L_READS 
+ P00L_TEMP_INDEX_L_READS) AS SUM_TEMP_L0G_RD , 

SUM (pool _temp_data_p_reads + pool_temp_index_p_reads 
+ pool _temp_xda_p j-eads) AS SUM_TEMP_PHYS_RD, 

SUM(P00L_DATA_WRITES + P00L_XDA_WRITES 
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+ POOLINDEXWRITES) AS SUM_TEMP_POOL_WRITES, 

MAX(TBSP_MAX_PAGE_TOP) AS MAX_TBSP_MAX_PAGE_TOP 
FROM TABLE(MON_GET_TABLESPACE(" ,-2)) 

WHERE TBSP_CONTENT_TYPE like 1 %TEMP 1 
GROUP BY TBSP_NAME, TBSP_ID 
ORDER BY TBSP_NAME, TBSP_ID 

TBSP_NAME TBSP_ID SUM_TEMP_LOG_RD SUM_TEMP_PHYS_RD 

TEMP16K 260 201654246 48394124 ... 

1 record(s) selected. 

. . .SUM_TEMP_POOL_WRITES MAX_TBSP_MAX_PAGE_TOP 


61706796 2689088 

► Temporary table compression: 

DB2 9.7 can potentially use temporary tables compression for sort spills, 
optimizer TEMP operations, and table queue spills. You can measure how 
effective the compression is by running a db2pd with the -temptable flag. 

In Example 6-46, we can see the following information for the first logical 
database partition on the first data node: 

- There were 106 system temporary tables in total since the database was 
activated. 

- Four out of 1 06 were eligible for compression (flagged by DB2 optimizer as 
being a candidate for compression), and one was actually compressed 
(triggered during runtime, based on criteria such as the size of the spill). 

- A total of 853 MB were spilled. The compression ratio is around 1 0%. 

The compression ratio is defined as follows: 

Total Sys Temp Bytes Saved / (Total Sys Temp Bytes Saved + Total Sys Temp Bytes Stored) 


Example 6-46 db2pd -db bcukit -temptable output 



Number of System Temp Tables : 106 


Comp Eligible Sys Temps : 4 

Compressed Sys Temps : 1 

Total Sys Temp Bytes Stored : 895165003 

Total Sys Temp Bytes Saved : 101051400 

Total Sys Temp Compressed Rous : 61200 

Total Sys Temp Table Rows: : 7811414 


User Temp Table Stats: 

Number of User Temp Tables : 0 
Comp Eligible User Temps : 0 
Compressed User Temps : 0 
Total User Temp Bytes Stored : 0 
Total User Temp Bytes Saved : 0 
Total User Temp Compressed Rows : 0 
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Total User Temp Table Rows: : 0 

System Temp Table Stats: 

Number of System Temp Tables : 113 

Comp Eligible Sys Temps : 10 

Compressed Sys Temps : 3 

Total Sys Temp Bytes Stored : 1001945997 

Total Sys Temp Bytes Saved : 1591836900 

Total Sys Temp Compressed Rows : 702200 

Total Sys Temp Table Rows: : 7823364 


User Temp Table Stats: 

Number of User Temp Tables 
Comp Eligible User Temps 
Compressed User Temps 
Total User Temp Bytes Stored 
Total User Temp Bytes Saved 
Total User Temp Compressed Rows 
Total User Temp Table Rows: 
System Temp Table Stats: 

Number of System Temp Tables 
Comp Eligible Sys Temps 
Compressed Sys Temps 
Total Sys Temp Bytes Stored 

Total Sys Temp Bytes Saved 

Total Sys Temp Compressed Rows 

Total Sys Temp Table Rows: 


1020964698 

449107550 

1870400 

7841952 


User Temp Table Stats: 

Number of User Temp Tables 
Comp Eligible User Temps 
Compressed User Temps 
Total User Temp Bytes Stored 
Total User Temp Bytes Saved 
Total User Temp Compressed Rows 
Total User Temp Table Rows: 
System Temp Table Stats: 

Number of System Temp Tables 
Comp Eligible Sys Temps 
Compressed Sys Temps 
Total Sys Temp Bytes Stored 

Total Sys Temp Bytes Saved 

Total Sys Temp Compressed Rows 

Total Sys Temp Table Rows: 


990281374 

10051760 

61200 

7813753 


User Temp Table Stats: 

Number of User Temp Tables : 0 
Comp Eligible User Temps : 0 
Compressed User Temps : 0 
Total User Temp Bytes Stored : 0 
Total User Temp Bytes Saved : 0 
Total User Temp Compressed Rows : 0 
Total User Temp Table Rows: : 0 


Utilities using high I/O 

If you do not see any applications doing an excessive amount of I/O, check for 
utilities doing a high amount of I/O, such as LOAD or BACKUP. These utilities do 
not perform I/O using the DB2 buffer pool. Their I/O is tracked as direct reads and 
direct writes. These metrics are returned by MON_GET_BUFFERPOOL 
relational monitoring function as well as buffer pool snapshots. 
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As shown in Example 6-47, you can track the number of direct reads and writes, 
as well as the time it takes to perform those I/O operations. You can compute the 
number of direct read and write operations performed per request. As we can 
see in the example, in this case, there appears to be I/O done against specific 
database partitions 1 , 2, 6, and 8 only. 

Example 6-47 Direct reads and writes output 

SELECT SUBSTR(BP_NAME,1,10) AS BP_NAME, MEMBER, DIRECT_READS, 

CASE WHEN DIRECT_READ_REQS > 0 

THEN I NT (BIGI NT (DI RECT_READS) / BIGINT (DI RECT_READ_REQS) ) 

ELSE NULL END AS DIRECT_READ_PER_REQ, 

DIRECT_READ_TIME, DIRECT_WRITES, 

CASE WHEN DIRECT_WRITE_REQS > 0 THEN INT (BIGINT (DIRECTWRITES) / BIGINT 
(DIRECT_WRITE_REQS)) 

ELSE NULL END AS DIRECT_WRITE_PER_REQ, 

DIRECT_WRITE_TIME FROM TABLE (M0N_GET_BUFFERP00L ( " ,-2)) 

WHERE BP_NAME not like 1 1 BM% ' 

ORDER BY BP_NAME, MEMBER 

BP_NAME MEMBER DIRECT_READS DIRECT_READ_PER_REQ DIRECT_READ_TIME ... 


BP16K 0 24 

BP16K 1 480 

BP16K 2 480 

BP16K 3 256 

BP16K 4 256 

BP16K 5 256 

BP16K 6 96 

BP16K 7 256 

BP16K 8 480 


32 

32 

32 

32 

32 

32 

32 

32 


126 

245 

76 

14 

7 

18 

5 

135 


,DIRECT_WRITES DIRECT_WRITE_PER_REQ DIRECT_WRITE_TIME 


1080 

7623552 

7617184 

1120 

1120 

1120 

8779296 

1120 

7596800 


9 record(s) selected. 


77 254 
955 396753 
955 283409 
280 12 
280 9 
280 10 
914 389882 
280 13 
955 243858 


If you see a large number of direct reads and direct writes potentially affecting the 
I/O on your system, you can narrow down any utilities running on the system 
using the LIST UTILITIES SHOW DETAIL command, and throttle the utility. 
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6.3.3 DB2 memory usage 


In order to investigate any issues related to memory on an IBM Smart Analytics 
System, it is essential to understand how DB2 is using memory on an IBM Smart 
Analytics System. In this section we discuss the various memory allocations 
done within DB2, in order to account for all the memory usage. 

Global memory management parameters 

Two global memory parameters are used to cap the memory usage on each 
server. These parameters are left set to AUTOMATIC by default: 

► INSTANCE_MEMORY: 

This database manager configuration parameter controls the maximum 
amount of memory that can be used by each DB2 logical database partition, 
including all shared memory and private memory usage for the agents 
associated to that particular logical database partition. 

INSTANCE_MEMORY can be set either to AUTOMATIC or a specific value: 

- If set to AUTOMATIC and SELF_TUNING_MEM (STMM) is ON, this 
parameter enables automatic tuning of instance memory according to the 
memory available at the operating system level. In this configuration, 

DB2 will consume between 75% and 95% of the RAM for all the database 
partitions within a server. However, because STMM is disabled by default 
for the 5600, 7600, and 7700 offerings, this parameter is ignored for IBM 
Smart Analytics System 5600, 7600, and 7700. 

- If set to a specific value, this value will cap the amount of memory used by 
the logical database partition. This setting is not desirable for the IBM 
Smart Analytics System. 

► DATABASE_MEMORY: 

This database configuration parameter represents the maximum amount of 
memory usable for the database shared memory allocations. On DB2 9.7, 
this value is equal to individual memory pool allocations within the database 
shared memory set such as buffer pool, utility heap size with additional 
memory to accommodate for dynamic memory growth. This parameter is left 
to its default value of AUTOMATIC for the IBM Smart Analytics System. The 
following settings are allowed: 

- AUTOMATIC: Default value. If STMM is enabled, DATABASE_MEMORY 
will be tuned automatically. 

- COMPUTED: Values are calculated based on the sum of the database 
shared memory set and other heap settings during database startup. 
There is provisioning done for dynamic growth in the calculations. If STMM 
is disabled as in the IBM Smart Analytics System environments, 
AUTOMATIC and COMPUTED hold the same meaning. 
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- Specific value: The value will act as a cap for all database shared memory 
requirement. If this value cannot be allocated initially or is higher than 
INSTANCE_MEMORY database activation will fail. 

► SELF_TUNING_MEM: 

This database configuration parameter allows you to turn the DB2 Self-Tuning 
Memory Manager ON. This parameter is turned OFF by default with the IBM 
Smart Analytics System. 

DB2 memory allocations 

There are mainly two types of memory allocation within DB2: 

► Shared memory allocations: There are various types of shared memory sets 
allocated within DB2 at the instance, database, and application level. 

► Private memory allocations: Private memory allocated at each individual DB2 
EDU level. 

A summary of the DB2 memory usage statistics is provided by the command 
db2pd -dbptnmem, as shown in Example 6-48. It shows the Memory allocation 
limit for the database partition corresponding to INSTANCE_MEMORY 
(Memory Limit), and the current total memory consumption for the logical 
database partition (Current usage), and separates that into individual 
consumers, including the total application memory, instance shared memory set, 
private memory, FCM shared memory set, database shared memory, and FMP 
shared memory set, giving the current usage, the high watermark usage, and the 
cached memory usage for each set. In the following section, these memory sets 
are discussed in more detail. 

Example 6-48 db2pd -dbptnmem 

bcul inux@ISAS56RlD2:~> db2pd -dbptnmem -dbp 1 

Database Partition 1 -- Active -- Up 0 days 03:05:59 -- Date 10/06/2010 
00:45:24 

Database Partition Memory Controller Statistics 

Controller Automatic: Y 
Memory Limit: 14838876 KB 

Current usage: 5474048 KB 

HWM usage: 5954368 KB 

Cached memory: 1062848 KB 

Individual Memory Consumers: 

Name Mem Used (KB) HWM Used (KB) Cached (KB) 
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APPL-BCUKIT 

DBMS-bcul inux 

FMP_RESOURCES 

PRIVATE 

FCM_RESOURCES 

DB-BCUKIT 

LCL-p9043 

LCL-p9043 


160000 160000 150784 

34048 34048 1344 

22528 22528 0 

914880 1234368 361728 

355136 562560 0 

3987200 3987200 548992 

128 128 0 

128 128 0 


Note that the db2pd -dbptnmem command output includes additional virtual 
memory allocation to accommodate any potential growth, not just the actual 
system memory usage. 

The table function ADMIN_GET_DBP_MEM_USAGE can also be used to show 
the instance memory usage as shown in Example 6-49. 

Example 6-49 ADMIN_ GET_DBP_MEM_ USAGE 

$ db2 "select * from table (sysproc.admin_get_dbp_mem_usage()) as t where 
DBPARTITI0NNUM=1" 

DBPARTITIONNUM MAX_PARTITI0N_MEM CURRENT_PARTITI0N_MEM PEAK_PARTITI0N_MEM 


1 15195009024 5605425152 6097272832 

1 record(s) selected. 


Shared memory allocations 

The following shared memory sets are allocated by DB2: 

► Instance level shared memory sets: Allocated when the instance is started. 
When performing memory usage calculations, these segments are allocated 
once per server. For example, these allocations must not be counted multiple 
times for each logical database partition on the same server. This includes the 
following shared memory segments: 

- Database manager shared memory set: Includes memory allocations for 
heaps such as monitor heaps, and other internal heaps. 

- FCM shared memory set: Used for FCM resources memory allocation, 
such as FCM buffers, and channels. Generally the main memory 
consumer at the instance level for IBM Smart Analytics System. 

- FMP shared memory set: Shared memory segment allocated for 
communication with db2fmp threads. 

- Trace shared memory segment: Allocated for the trace segment for DB2 
trace. 
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Instance-level shared memory sets allocations can be tracked using the 
db2pd -inst -memsets command, as shown in Figure 6-19. 



Figure 6- 1 9 db2pd -inst -memsets 


The Size (Kb) column represents the shared memory segment size. The 
Committed memory Cmt (Kb) column represents the amount of memory 
committed at the operating system level, and the HWM (Kb) is the highest 
memory usage in the pool since the instance was started. For an actual 
memory usage calculation, you can rely on the Cmt (Kb) column for the actual 
allocation. This shared memory is allocated at the server level. 

You can further drill down the various memory pool allocations within each of 
these memory sets by using the db2pd -inst -memsets -mempools 
command, as shown in Figure 6-20. This output contains each individual 
memory pool allocation within each memory set. You can, for example, see 
the drill down for all memory allocations within the FCM shared memory set. 
The physical HWM sum for all memory pools within the set has to match 
approximately the HWM (Kb) column for the memory set in the output in 
Figure 6-19. 



► Database and application level shared memory sets: Database level shared 
memory sets are allocated when the database is activated. These shared 
memory segments are allocated for each database partition: 

- Database shared memory set: Generally the main memory consumer for 
IBM Smart Analytics System. Includes allocations for memory heaps such 
as buffer pools, package cache, locklist, utility heap, and database heap. 
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- Application shared memory set: Application group shared memory 
segment controlled by APPLICATION_MEMORY. 

- Application control shared heap segments: Application level shared 
memory set. Not allocated when the database is activated, but allocated 
as needed, depending on the number of applications connected to the 
database partition. 

Database level shared memory allocation set can be tracked for each logical 
database partition using the db2pd -db <db-name> -memsets command, as 
shown in Figure 6-21 . 

For example, for database partition 1 , the output shows that the total of the 
memory committed is 3226112 KB, which is approximately 3 GB. The default 
memory database allocation is identical for the four logical database 
partitions. Therefore, we are using approximately 12 GB for database shared 
memory per server. This can be verified using the command db2pd -db 
bcukit -memsets -alldbp | grep BCUKIT. The application memory allocation 
amounts for approximately 36 MB for the entire database partition when the 
output is collected, which is negligible in this case. 
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Private memory allocations 

The private memory allocations are made in a single large memory area per 
database partition, allocated from the db2sysc process private memory. At the 
OS level, this area of memory is shared by all the threads within the db2sysc 
process. At the DB2 level, DB2 manages this memory in thread specific memory 
pools. Private memory allocations includes all the private memory allocations 
made by the various DB2 EDUs within a db2sysc process. For the IBM Smart 
Analytics System, the main consumer is SORTHEAP. The default configuration 
uses private sorts, with a large SHEAPTHRES. See 7.1.2, “DB2 configuration” 
on page 217 for further details about sort configurations with the IBM Smart 
Analytics System. 

To estimate the total amount of private memory allocations for all the DB2 agents 
for each database partition, we can use the data returned from db2pd -dbptnmem 
output, as shown previously in Example 6-48 on page 1 87. 

The output shows that the total number of private memory allocations used is 
914880 KB, which is approximately 893 MB. The memory used corresponds to 
the actual memory allocated. In order to account for all private memory 
allocations within this server, we need to add this value for the four logical 
partitions within the server, which can be obtained through the command, 
db2pd -dbptnmem -alldbp. 

Memory usage calculation example 

In this example, we perform an estimate of the memory used by the first data 
node, based on the outputs of commands reviewed previously. 

► Instance level shared memory: 

Figure 6-22 shows a sample output of db2pd -inst -memsets. 

Based on this output, the total committed memory for instance level shared 
memory set is as follows: 

Instance level shared memory Cmt(Kb) = 12480 + 22592 + 39240 + 1325568 = 
1399880 Kb 



Figure 6-22 db2pd -inst -memsets output 
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► Database and application level shared memory: 

Figure 6-23 shows the db2pd -db bcukit -memsets output for logical 
database partition 1 . 


e 10/06/2010 1! 


App66057 

app66050 

App66043 

App66056 

App66049 

App66042 

App66055 

App66048 

App66047 


Figure 6-23 Database level shared memory allocations 


After issuing the command with the -al ldbp flag, we have verified that there 
are no application control shared memory allocations “AppXXXXX” on any 
database partition, so we can filter it out with the following db2pd command: 
db2pd - db bcukit -memsets -alldbp | egrep "BCUKIT |AppCtl|Cmt" | grep -v 
Database 

Figure 6-24 shows the output. The total database level shared memory is: 
Database level shared memory Cmt (Kb) = 3161984 + 9984 + 3177536 + 9856 + 
3151680 + 9600 + 3144128 + 9792 = 12674560 KB 
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► Private memory: 

To get an estimate of the total private memory used, we can get an output of 
db2pd -dbptnmem -alldbp, as shown in Example 6-50. 

Private Mem Used (Kb) = 770240 + 683584 + 600000 + 769792 = 2823616 KB 


Example 6-50 Private memory 


db2pd -dbptnmem 

Name 

PRIVATE 

Name 

PRIVATE 

Name 

PRIVATE 

Name 

PRIVATE 


-alldbp | egrep "PRIVATE|Used" 

Mem Used (KB) HWM Used (KB) Cached (KB) 
770240 1234368 436352 

Mem Used (KB) HWM Used (KB) Cached (KB) 
683584 1209024 357824 

Mem Used (KB) HWM Used (KB) Cached (KB) 
600000 1167616 262912 

Mem Used (KB) HWM Used (KB) Cached (KB) 
769792 1182208 440576 


► Total Memory used: 

From a DB2 perspective, the total amount of memory currently used on the 
system can be roughly accounted for as follows: 

Total amount of memory used 

= Instance level shared memory + Database level shared memory + Private memory used 
= 1399880 KB + 12674560 KB + 2823616 KB 
= 16898056 KB 
= 16.1 GB 

So, DB2 is approximately using 16 GB of memory out of 64 GB available on 
this system. 


6.3.4 DB2 network usage 

The main network usage with an IBM Smart Analytics System is generally with 
the DB2 FCM internal communications between database partitions. In order to 
understand the network usage for internal communications between database 
partitions and establish a baseline, you have to monitor the FCM activity with the 
db2top utility, which provides live information about the traffic in terms of buffers 
received and sent per second, on the Partitions Screen. You can get to the 
Partition screen by pressing p, as shown in Figure 6-25. 
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Figure 6-25 db2top Partitions Screen 


Outside the scope of a network bottleneck, an uneven usage of FCM resources 
(except from the administration node which communicates with the rest of the 
nodes) can be an indication of data skew. This situation occurs unless you have 
defined custom database partition groups where the workload can be isolated to 
a few database partitions belonging to the same database partition group. 

The relational monitoring function M0N_GET_C0NNECTI0N gives you the details of 
the FCM volume per application in order to narrow down the FCM resource 
usage per application, and detect the applications consuming the most FCM 
buffers, as shown in Example 6-51 . 

Example 6-5 1 FCM buffer usage 


SELECT APPLICATION_HANDLE AS AGENTJD, 

MEMBER, FCM_RECV_VOLUME AS FCM_RCV_BYTES, 

FCM_RECVS_TOTAL as FCM_RCV_BUFFS, 

FCM_SEND_VOLUME AS FCM_SND_BYTES, 

FCM_SENDS_TOTAL AS FCM_SND_BUFFS 

FROM TABLE (M0N_GET_C0NNECTI0N (cast (NULL as bigint), -2)) 

ORDER BY APPLICATION_HANDLE, MEMBER 

AGENTJD MEMBER FCM_RCV_BYTES FCM_RCV_BUFFS FCM_SND_BYTES FCM_SND_BUFFS 


51 0 

51 1 

51 2 

51 3 

51 4 


0 

0 

0 

0 

0 


0 

0 

0 

0 

0 
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51 5 0 

51 6 471905 

51 7 0 

51 8 471905 

78 0 4503616 

78 1 0 

78 2 0 

78 3 0 

78 4 0 

78 5 0 

78 6 0 

78 7 0 

78 8 0 

18 record(s) selected. 


0 0 0 

160 19160 20 

0 0 0 

160 19160 20 

1200 3464 8 

0 0 0 

0 0 0 

0 0 0 

0 0 0 

0 0 0 

0 0 0 

0 0 0 

0 0 0 


In this example, we notice that application handle 51 is driving an uneven FCM 
resource usage on database partitions 0, 6 and 8. 

Alternatively, the db2pd -fcm output allows you to further narrow down the FCM 
traffic per database partition to a given application handle (agent ID). You can 
then use db2top Sessions screen to narrow down to the particular SQL executed 
by the application handle which might be driving a high FCM consumption. 

In addition, DB2 9.7 Fix Pack 2 has the MON_GET_FCM and 
MON_GET_FCM_CONNECTION_LIST table functions that give FCM monitor 
metrics. 


6.4 Common scenario: Data skew 

In this section, we show an example of an uneven resource consumption on a 
IBM Smart Analytics System caused by data skew. 
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6.4.1 Operating system monitoring 

Through ongoing monitoring, the system administrator notices an unusual 
pattern of the second data node being used much more than the first data node. 

6.4.2 DB2 monitoring 

In order to narrow down what is causing an uneven resource consumption, we 
can use db2top to check the resource usage pattern from a DB2 perspective. We 
start db2top as follows: 

db2top -d bcukit 

We press J to get to the skew detection screen, as shown by Figure 6-26, and we 
notice that there is indeed a skew in the number of rows read, rows written on 
database partition 6 and database partition 8. There is also a significant 
difference in the number of FCM buffer usage. 



We can, for example, rely on the FCM traffic usage and track the applications 
sending most of the traffic on database partitions 6 and 8. Here we use new 
monitoring function MON_GET_CONNECTION which provides the FCM usage 
per application, as shown in Example 6-52. 
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Example 6-52 FCM usage 


SELECT APPLICATION_HANDLE AS AGENT_ID, 

MEMBER, ROWS_READ, ROWS_MODIFIED, 

FCM_RECV_VOLUME AS FCM_RCV_BYTES, 

FCM_SEND_VOLUME AS FCM_SND_BYTES 

FROM TABLE(MON_GET_CONNECTION(cast (NULL as bigint), -2)) 
WHERE MEMBER=6 OR MEMBER=8 ORDER BY FCM_SND_BYTES DESC 


AGENT_ID MEMBER R0WS_READ R0WS_M0DIFIED FCM_RCV_BYTES FCM_SND_BYTES 


107 8 71924735 

84 8 71513627 

89 8 71513627 

93 8 71565879 

127 8 71565067 

108 8 71488495 

94 8 71488112 

96 8 71463759 

! 

88 8 64594919 

105 8 59424897 

107 6 55810061 

95 8 55703475 

126 6 54965316 

98 8 55115256 

84 6 51653227 

108 6 50723946 

88 6 50377728 

87 6 50420967 

118 8 50514698 

81 6 50667868 

125 6 50318861 

127 6 49795137 

112 6 49684008 

104 6 49417721 

119 6 49706626 

95 6 49244202 

105 6 48634131 

122 6 48746276 

78 6 48634633 

113 6 48601675 

116 6 48253805 

92 6 48427260 

121 6 47559034 


92 680626 3034036 

91 618100 3025924 

91 617804 3025924 

90 632696 3025776 

91 621712 3025480 

91 617212 3021424 

90 605933 3021424 

91 629380 3017664 


78 

64 

0 

64 

59 

6 


646046 2738844 
560726 2514733 
692496 2372923 
536982 2359128 
640065 2339884 
477178 2338108 
590509 2189227 
592873 2148668 
593909 2140557 
798972 2139816 
506459 2139811 
575465 2132296 
588225 2120277 
576945 2112018 
649952 2109146 
576797 2103906 
564333 2099702 
550181 2083923 
561017 2071755 
468174 2070423 
601990 2067403 
568537 2067403 
495677 2050587 
468322 2050143 
631969 2022937 


102 record(s) selected. 


We notice that there are a few applications which are the top FCM senders, and 
receivers, and many have them show around the same FCM application usage. 
We pick the top two agent IDs and find out what query they are executing using 

db2top. 

We press 1 to get to the sessions screen on db2top, then press a, and enter 107 
when prompted for the agent ID, as shown in Figure 6-27. 
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Figure 6-28 shows the Session details screen. We review the SQL being 
executed by the application. It is a simple SQL statement selecting data only from 
one table TPCD.PARTSKW. 
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We do the same investigation for the second application with agent ID 84, and it 
turns out that the application is executing the same query. 

At this point, we can further check the table and get a db21ook output to verify its 
distribution key. Then we can run a query, as shown in Example 6-53, to verify 
the distribution of the table. 

Example 6-53 Using db2look to obtain DDL 


# db21ook -e -d bcukit -z TPCD -t PARTSKW 

-- No userid was specified, db21ook tries to use Environment variable USER 

— USER is: BCULINUX 

— Specified SCHEMA is: TPCD 

-- The db21ook utility will consider only the specified tables 

— Creating DDL for table(s) 

-- Schema name is ignored for the Federated Section 

— This CLP file was created using DB2L00K Version "9.7" 

— Timestamp: Thu 07 Oct 2010 05:30:45 PM EST 
-- Database Name: BCUKIT 

— Database Manager Version: DB2/LINUXX8664 Version 9.7.2 
-- Database Codepage: 1208 

-- Database Collating Sequence is: IDENTITY 


CONNECT TO BCUKIT; 


- DDL Statements for table "TPCD "."PARTSKW" 


CREATE TABLE "TPCD "."PARTSKW" ( 

P_PARTKEY" INTEGER NOT NULL , 
P_NAME" VARCHAR (55) NOT NULL , 
"P_MFGR" CHAR(25) NOT NULL , 

■'P BRAND" CHAR ( 10) NOT NULL , 
"PTYPE" VARCHAR(25) NOT NULL , 
P_SIZE" INTEGER NOT NULL , 
P_C0NTAINER" CHAR(10) NOT NULL 
"P_RETAILPRICE" DOUBLE NOT NULL 
"PC0MMENT" VARCHAR(23) NOT NUL 
COMPRESS YES 

DISTRIBUTE BY HASH("P_MFGR") 

IN "OTHERS" ; 


— DDL Statements for indexes on Table "TPCD "."PARTSKW" 

CREATE UNIQUE INDEX "TPCD "."PPKSKW" ON "TPCD "."PARTSKW" 
("P_PARTKEY" ASC, 

"PMFGR" ASC) 

PCTFREE 0 

COMPRESS NO ALLOW REVERSE SCANS; 

COMMIT WORK; 

CONNECT RESET; 

TERMINATE; 
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We can check the data distribution using an SQL query, as shown in 
Example 6-54. The query shows that the entire table appears to have 200 million 
rows (by doing a sum of each node cardinality). We can also see that the table 
has roughly 60% of rows located on database partition 6 and 40% located on 
database partition 8. No rows are located on any other database partition. These 
partitions show the highest FCM buffer usage on Example 6-52 on page 197. 
Based on this output, we have a very significant data skew. 

Example 6-54 Data distribution for TPCD.PARTSKW 

# db2 “select dbpartitionnum(p_mfgr) as NODE NUMBER, count (*) AS N0DE_CARD from 
tpcd.partskw group by dbpartitionnum(pjnfgr) order by dbpartitionnum(pjnfgr)” 


NODE_NUMBER N0DE_CARD 


6 120000973 

8 79999027 

2 record(s) selected. 


We can further verify the domain for the column P_MFGR as shown in 
Example 6-55. We notice that only three unique values are currently being used 
in this column, which does not make this particular column an ideal distribution 
key. 

Example 6-55 Domain for column P_MFGFt 

# db2 "select distinct pmfgr, count(*) from tpcd.partskw group by pmfgr" 

P_MFGR 2 


Manufacturer#! 79998128 
Manufactured 40002845 
Manufactured 79999027 

3 record(s) selected. 


In this particular scenario, we showed that the uneven resource usage is caused 
by a data skew. The data skew results from a poor distribution key choice. 
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For further guidelines on distribution keys, see the following documentation: 

► DB2 information Center: 

http : //publ i b. boul der . i bm.com/infocenter/db21 uw/v9r7/topi c/com. i bm.d 
b2 . 1 uw . admi n . parti ti on . doc/doc/c0004906 . html 

► IBM developerWorks article, Choosing partitioning keys in DB2 Database 
Partitioning Feature environments : 

http : //www. i bm.com/devel operworks/data/1 i brary/techarti cl e/dm-1005pa 
rtitioningkeys/ 
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7 


Advanced configuration and 
tuning 


In this chapter we discuss details of the configuration of your IBM Smart 
Analytics System for your particular environment. We provide a configuration 
parameter summary for the IBM Smart Analytics System 5600 VIA/2, 7600, and 
7700. We discuss parameters that can be adjusted to meet your specific 
workload needs. 

We also describe how to configure DB2 workload manager to minimize the 
performance degradation seen due to a concurrent or conflicting use of 
resources. 

In certain cases, additional tuning or workload management might not be 
sufficient or appropriate to resolve the performance degradation experienced and 
to meet your service level agreement (SLA). The answer in these cases might 
reside in increasing the resources available on the existing physical partitions, 
such as additional memory or a Solid State Drive (SSD), to provide better 
performance, or scaling up your system by adding additional partitions. We 
discuss these options in the last section. 
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7.1 Configuration parameters 


The IBM Smart Analytics System is designed to provide optimal performance for 
business intelligence workloads and comes with a prescribed configuration. The 
configuration is tailored based on the hardware specifications, and the entire 
architecture of the solution. It integrates the best practices in the field and takes 
into account the customer’s typical business intelligence environment workloads 
and constraints to provide a good starting point for most customer environments. 

In this section, we discuss the various types of parameters, along with guidelines 
on whether these parameters can be changed or not. We review the parameters 
for the IBM Smart Analytics System 7600 and 7700, as well as the IBM Smart 
Analytics System 5600 VI and V2. Certain 7600 configurations use DB2 9.5. 
However, this chapter focuses only on configurations that use DB2 9.7. 

Configuration parameters are also described for specific environments in the 
official IBM Smart Analytics System documentation, available for download at the 
following link: 

https ://wwwl4. software. i bm.com/webapp/iwm/web/preLogin. do? 1 ang=en_US&so 
urce=idwbcu 

These parameters are specific to the IBM Smart Analytics System, and might not 
be appropriate for tuning other environments. If there is any discrepancy on your 
system to what is described in this section, check the latest IBM Smart Analytics 
System official documentation. If there are discrepancies between the official 
documentation and your original system settings, contact your IBM Smart 
Analytics System Support. 

7.1.1 Operating system and kernel parameters 

In this section, we discuss the various operating system and kernel parameters 
for the AlX-based and the Linux-based environments. 

AIX: 7600 and 7700 based environments 

The AIX kernel parameters to use are almost identical for the IBM Smart 
Analytics System 7600 and 7700. Differences, if any, are mentioned in this 
section. Details about the meaning of the parameters, the commands to set 
them, and how to display these parameters, are documented in the AIX 6.1 
Information Center at this address: 

http ://publ ib.boulder.ibm.com/infocenter/aix/v6rl/index.jsp 
All the commands must be submitted as root unless specified otherwise. 
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Important: Do not use any deviation for the AIX kernel parameters described 
in this section without consulting your IBM Smart Analytics System Support. 

AIX Virtual Memory Manager 

IBM Smart Analytics System 7600 and 7700 environments use the default AIX 
V6.1 Virtual Memory Manager (VMM) settings. These parameters are accessible 
through the AIX vmo command. You can use the following vmo command to 
display these parameters: 
vmo -L 

Example 7-1 shows an output of vmo -L to display VMM parameters on an IBM 
Smart Analytics System 7700 environment. 

Example 7- 1 Virtual Memory Manager parameters for the 7700 

(0) root @ i sas77adm: 6. 1.0.0: / 

# vmo -L 

NAME CUR DEF BOOT MIN MAX UNIT TYPE 

DEPENDENCIES 

ame_cpus_per_pool n/a 8 8 1 IK processors B 

amejnaxfreejnem n/a 24M 24M 320K 16G bytes D 

ame_mi nfree_mem 

ame_min_ucpool_size n/a 0 0 5 95 % memory D 


amejni nf reejnem n/a 8M 8M 64K 16383M bytes D 



ams_loan_pol icy n/a 1102 numeric D 


enhanced_af f i ni ty_af f i n_time 

1 1 1 0 100 numeric D 


enhanced_af f i ni ty_vmpool _1 i mi t 

10 10 10 -1 100 numeric D 


force_relalias_lite 00001 boolean D 


kernel _heap_psize 64K 0 0 0 16M bytes B 


lgpg_regions 0000 8E-1 D 

lgpg_size 


lgpg_size 0 0 0 0 16M bytes D 

lgpg_regions 


1 ow_ps_handl i ng 11112 D 


maxfree 1088 1088 1088 16 25548K 4KB pages D 


27897K 27897K 

25731K 25731K 
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maxpin% 80 80 80 1 100 % memory 

pinnable_frames 
memory_frames 


memory_frames 31936K 31936K 4KB pages 


mempl ace_data 00002 


memplace_mapped_file 00002 


mempl ace_shm_anonymous 00002 


mempl ace_shm_named 00002 


mempl ace_stack 00002 


mempl ace_text 00002 


mempl ace_unmapped_f i 1 e 0 0 0 0 


minfree 960 960 960 8 25548K 4KB pages 


npskill 

npswarn 

numpsbl ks 

pinnable_frames 

rel al i as_percentage 


96K 96K 96K 

384 K 384 K 384K 


100 % memory 

4G-1 uid 
12M-1 4KB pages 
12M-1 4KB pages 
4KB blocks 
4KB pages 


32K-1 


100 numeric 


n/a means parameter not supported by the current platform or kernel 

Parameter types: 

S = Static: cannot be changed 
D = Dynamic: can be freely changed 

B = Bosboot: can only be changed using bosboot and reboot 
R = Reboot: can only be changed during reboot 

C = Connect: changes are only effective for future socket connections 
M = Mount: changes are only effective for future mountings 
I = Incremental: can only be incremented 
d = deprecated: deprecated and cannot be changed 

Value conventions: 

K - Kilo: 2*10 G = Giga: 2^30 P - Peta: 2^50 

M - Mega: 2^20 T = Tera: 2^40 E = Exa: 2^60 
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I/O tuning parameters 

IBM Smart Analytics System 7600 and 7700 environments use the default AIX 
V6.1 I/O tuning parameters, except for these: 

► j2_minPageReadAhead=32 

This parameter represents the minimum number of pages read ahead when 
VMM first detects a sequential reading pattern. 

► j2_maxPageReadAhead=512 

This parameter represents the maximum number of pages that VMM can 
read ahead during a sequential access. 

These parameters are beneficial for buffered file system I/O. DB2 does not use 
file system caching for the bulk of its I/O activity, such as table space access as 
well as active transaction log files. Therefore, these parameters will not have a 
significant impact on DB2 performance. However, they might benefit any 
application or utility performing buffered file system I/O on the system. 

You can access these parameters through the AIX ioo command. The following 
command can be used to check these parameters: 
ioo -L 

Example 7-2 shows how to set these parameters with ioo. 

Example 7-2 Setting I/O tuning parameters with ioo on a 7700 
# ioo -p -o j2_minPageReadAhead=32 -o j2_maxPageReadAhead=512 


Network parameters 

The following kernel network parameters are changed from the default value in 
order to provide an optimal performance on the network: 

► sb_max=1310720 

This parameter represents the maximum buffer space that can be used by a 
socket send or receive buffer. This value caps the maximum size that can be 
set for udp sendspace, udp recvspace, tcp_sendspace, and tcp_recvspace. 

► rfcl 323=1 

This parameter enables the TCP scaling option. With this parameter set, the 
maximum TCP window size can grow up to 4 GB. 
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► ipqmaxlen=250 

This parameter sets the maximum internet protocol (IP) input queue length to 
250. This value is increased to limit any input queue overflow. You can monitor 
overflows using the following command: 

netstat -p ip 

Example 7-3 shows an example of this command on a 7700 system. 

Example 7-3 Monitoring IP input queue 

# netstat -pip | grep overflows 

0 ipintrq overflows 


► udp_sendspace=65536 

This parameter represents the size of the largest User Datagram Protocol 
(UDP) that can be sent. 

► udp_recvspace=655360 

This parameter represents the amount of incoming data that can be queued 
on each UDP socket. udp_recvspace is set to lOx udp_sendspace to provide 
buffering according to best practice. 

► tcp_sendspace=22 1 1 84 

This parameter specifies how many bytes of data can be buffered in the 
kernel (using the mbufs kernel memory buffers) by the TCP sending socket 
before getting blocked. For IBM Smart Analytics System, the value is equal to 
tcp_recvspace parameter. 

► tcp_recvspace=22 1 1 84 

This parameter specifies how many bytes of data can be buffered in the 
kernel (using the mbufs kernel memory buffers) on the receiving socket 
queue. This value is significant because it is used by the TCP protocol to 
determine the TCP window size and limit the number of bytes it sends to a 
receiver. 

The following AIX command can show the current settings in your environment: 
no -L 

Example 7-4 shows an example of how to set these parameters with no. 

Example 7-4 Setting network parameters with no 

no -p -o sb_max=1310720 -o rfcl323=l -o udp_sendspace=65536 -o 
udp_recvspace=655360 -o tcp_sendspace=221184 -o tcp_recvspace=221184 
no -r -o ipqmaxlen=250 
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Jumbo frames 

The network interfaces for the internal application network, also known as the 
DB2 FCM network, have jumbo frames enabled. The following command enables 
jumbo frames: 

chdev -1 ent2 -a jumbo_frames=yes 

In this command, ent2 is the network interface corresponding to the internal 
application network used for the DB2 FCM internal communications between 
database partitions. 

Example 7-5 shows the command to display the settings of a network interface 
device on an IBM Smart Analytics System 7700. 

Example 7-5 Displaying a network interface device settings 

if lsattr -El ent2 

alt_addr 0x000000000000 Alternate ethernet address True 

busintr 74261 

busmem 0xff9f0000 

chksum_offload yes 
delay_open no 
f 1 ow_ctrl yes 
i ntr_pri ori ty 3 
iomem OxffOOO 

jumbo_frames yes 
large_receive yes 
large_send yes 
romjnem OxffbOOOOO 

rx_coalesce 16 
rx_i ntr_del ay 100 
rxjntrjimit 1000 
rx_ipkt_idelay 4 
rxdesc_que_sz 1024 
tx_que_sz 8192 
txdesc_que_sz 512 
use_alt_addr no 


Bus interrupt level False 
Bus memory address False 
Enable hardware transmit and receive checksum True 
Delay open until link state is known True 
N/A True 
Interrupt priority False 
Bus 1/0 address False 
Enable jumbo frames True 
Enable receive TCP segment aggregation True 
Enable hardware transmit TCP segmentation True 
ROM memory address False 
Receive packet coalesce count True 
Receive Interrupt Delay timer True 
Max receive buffers processed per interrupt True 
Inter-packet Interrupt Delay timer True 
Receive descriptor queue size True 
Software transmit queue size True 
Transmit descriptor queue size True 
Enable alternate ethernet address True 


Fibre Channel device settings 

The following Fiber Channel parameters are set for all Fibre Channel devices on 
the system. These parameters help in providing optimal performance for large 
sequential I/O block size, which DB2 is using during sequential prefetching: 

► Ig_term_dma=0x1 000000 

This parameter controls the direct memory access (DMA) memory in bytes 
that the Fibre Channel adapter can use. 

► max_xfer_size=0x1 00000 

This parameter determines the maximum I/O transfer size in bytes that the 
adapter can support. 
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► num_cmd_elems=1024 

This parameter determines the maximum number of commands that can be 
queued to the adapter. 

You can use the following command to display the settings for your Fibre Channel 
devices: 

lsattr -El fcsO 

In this command, fcsO represents the Fibre Channel device, and fcs0-fc3 are 
configured in an IBM Smart Analytics System 7600 or 7700 environment. 

Example 7-6 shows an example of lsattr -El output for a Fibre Channel device 
on a 7700. 


Example 7-6 Displaying the Fibre Channel adapter settings 


# lsattr -El 

fcsO 



bus_io_addr 

0xff800 

Bus 1/0 address 

False 

bus mem addr 

0xff9f8000 

Bus memory address 

False 

ini t_l ink 

auto 

INIT Link flags 

True 

intr_msi_l 

254487 

Bus interrupt level 

False 

intr_priority 

3 

Interrupt priority 

False 

1 g_term_dma 

0x1000000 

Long term DMA 

True 

link_speed 

auto 

Link Speed Setting 

True 

max_xfer_size 

0x100000 

Maximum Transfer Size 

True 

num_cmd_elems 

1024 

Maximum number of COMMANDS t 

o queue to the adapter True 

pref_alpa 

Oxl 

Preferred AL_PA 

True 

sw_fc_class 

3 

FC Class for Fabric 

True 


Example 7-7 shows how to set these parameters using the chdev -1 command. 

Example 7-7 Using chdev -I to set Fibre Channel adapter parameters 

chdev -1 fcsO -a lg_term_dma=0xl000000 -a max_xfer_size=0xl00000 -a 
num_cmd_el ems=1024 


Hdisk devices settings 

The following hdisk device settings are set for the external storage hdisks on the 
IBM Smart Analytics System 7600 and 7700, all of which use the AIX PCM MPIO 
(Multiple Path I/O) driver. 

► max_transfer=0x1 00000 

This parameter determines the maximum transfer size in bytes in a single 
operation. Larger I/O block requests exceeding this size will be broken into 
smaller block sizes by the MPIO driver. 
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► queue_depth=128 

This parameter represents the maximum number of requests that can be 
queued to the device. 

► reserve_policy=no_reserve 

This parameter represents the reservation policy for the device. no_reserve 
setting does not apply a reservation methodology for the device. The device 
might be accessed by other initiators, which might be on other host systems. 

► algorithm=round_robin 

This parameter represents the algorithm used by the MPIO driver to distribute 
I/O across the multiple paths for the device. The round_robin setting 
distributes the I/O across all paths configured. 

You can use the following command to display your hdisk settings: 
lsattr -El hdisklO 

In this command, hdisklO represents a LUN on the external storage in this 
example. 

Example 7-8 shows how to set these parameters. 

Example 7-8 Setting hdisk parameters 

chdev -1 hdisklO -a max_transfer=0xl00000 -a queue_depth=128 -a 
reserve_policy=no_reserve -a algorithm=round_robin 


IOCP enablement 

DB2 9.7 supports I/O Completion Port (IOCP) for asynchronous I/O by default. 

Example 7-9 shows how to check if IOCP is enabled at the AIX level. The output 
shows IOCP in “Available” state. 

Example 7-9 Checking IOCP enablement 

# lsdev -Cc iocp 

iocpO Available I/O Completion Ports 


In order to make sure that IOCP works with DB2, you can monitor the 
db2diag.log DB2 diagnostics log file when the instance is starting up, you will get 
a warning message if IOCP is not enabled. 

Example 7-10 shows an example of a db2diag.log message when IOCP is not 
enabled on the system. 


Chapter 7. Advanced configuration and tuning 21 1 



Example 7- 1 0 IOCP message in db2diag. log 

2010-09-17-09.25.05.888484-300 E3313413A406 LEVEL: Warning 

PID : 54919616 TID : 258 PROC : db2sysc 1 

INSTANCE: bcuaix NODE : 001 

EDUID : 258 EDUNAME: db2sysc 1 

FUNCTION: DB2 UDB, oper system services, sqloStartAIOCollectorEDUs, probe:30 
MESSAGE : ADM0513W db2start succeeded. However, no 1/0 completion port (IOCP) 
is available. 


Maximum number of processes per user 

The AIX maxuproc parameter is set to 4096. The following command, issued as 
root, sets maxuproc to 4096: 
chdev -1 sysO -a maxuproc=4096 

Example 7-1 1 shows how to verify the maxuproc setting on a 7700. 

Example 7-11 Verifying maxuproc setting 
it lsattr -El sysO | grep maxuproc 

maxuproc 4096 Maximum number of PROCESSES allowed per user 

True 


User limits 

The following user limits are set to unlimited for all the users on the system: 

► Core size (core) 

► Data size (data) 

► File size (fsize) 

► Number of open file descriptors (nofiles) 

► Stack size (stack) - with a hard limit of 4 GB 

You can update their values in the /etc/security/limits file on each node of the 
cluster. All these values can be set to -1 for default, and stack_hard can be set to 
4194304. 

Example 7-12 shows how to update the /etc/security/limits file. 

Example 7-12 Setting user limits using /etc/security/limits file 
default: 

fsize = -1 
core = -1 
cpu = -1 
data = -1 
rss = -1 
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stack = -1 

stack_hard = 4194304 
nofiles = -1 


Another method consists of using the chuser command as root for all users on 
the system. Example 7-13 shows an example of chuser usage for this purpose. 

Example 7-13 Set user limits using chuser 

chuser core=-l data=-l fsize=-l nofiles=-l stack=-l stack_hard=4194304 bcuaix 


Linux: IBM Smart Analytics System 5600 VI and 5600 V2 
environments 

In this section we list all the kernel parameters settings for the IBM Smart 
Analytics System 5600 VI and 5600 V2 environments with or without SSD 
options. The kernel parameters settings apply to all environments, unless 
specified otherwise (for example, kernel IPC parameters). At the end of the 
section, there is an example of how to update these parameters. 

Kernel parameters 

The following Linux kernel parameters are set for the IBM Smart Analytics 
System 5600 VI and 5600 V2: 

► Kernel IPC parameters: 

Table 7-1 lists the IPC related kernel parameters and their settings on the IBM 
Smart Analytics System 5600 VI and 5600 V2. 


Table 7- 1 IPC related kernel parameters 


Kernel parameter 

Meaning 

5600 VI 

5600 V2 

kernel. msgmni 
(MSGMNI) 

Maximum number of system wide message queue 
identifiers. 

16384 

131072 

kernel. msgmax 
(MSGMAX) 

Maximum size of a message that can be sent by a 
process 

Default 

(65536) 

65536 

kernel, msgmnb 
(MSGMNB) 

Maximum number of a bytes in a message queue 

Default 

65536 

kernel. sem 
(SEMMSL) 

Maximum number of semaphores per array 

250 

250 

kernel. sem 
(SEMMNS) 

Maximum number of semaphores system wide: 

256000 

256000 
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Kernel parameter 

Meaning 

5600 VI 

5600 V2 

kernel, sem 
(SEMOPM) 

Maximum number of operations in a single 
semaphore call 

32 

32 

kernel, sem 
(SEMMNI) 

Maximum number of semaphores array 

8192 

32768 

kernel. shmmni 
(SHMMNI) 

Maximum number of shared memory segments 

32768 

32768 

kernel. shmmax 
(SHMMAX) 

Maximum size in bytes of a shared memory 
segment 

Default 

128 000 
000 000 

kernel.shmall 

(SHMALL) 

Maximum amount of shared memory that can be 
allocated 

Default 

256 000 
000 000 


IPC resources: The Linux kernel parameters related to IPC resources are 
a good starting point for IBM Smart Analytics System 5600 VI and 5600 
V2. If there is suspicion or evidence of DB2 errors due to a shortage of IPC 
resources, consult with your IBM Smart Analytics System support. 


In order to check the value for a parameter, you can read the corresponding 
parameter from /proc/sys/kernel. Example 7-14 shows how to display these 
values. 

Example 7-14 Displaying the value of IPC related kernel parameters 

# cat /proc/sys/kernel /msgmni 

131072 

# cat /proc/sys/kernel /sem 

250 256000 32 32768 


You can use the ipcs -1 command to display the IPC related kernel 
parameters in effect for your system. Example 7-15 shows an output from an 
IBM Smart Analytics System 5600 VI system. 

Example 7- 1 5 Displaying IPC related kernel parameters 

# ipcs -all 

Shared Memory Limits - 

max number of segments = 16128 

max seg size (kbytes) = 18014398509481983 

max total shared memory (kbytes) = 4611686018427386880 

min seg size (bytes) = 1 

Semaphore Limits 
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max number of arrays = 8192 
max semaphores per array = 250 
max semaphores system wide = 256000 
max ops per semop call = 32 
semaphore max value = 32767 

Messages: Limits - 

max queues system wide = 16384 
max size of message (bytes) = 65536 
default max size of queue (bytes) = 65536 


Important: For all other Linux kernel parameters listed next, do not deviate 
from the configuration listed without consulting the IBM Smart Analytics 
System support services. 


kernel. suid_dumpable=1 

This kernel parameter controls if a core file can be dumped from a suid 
program such as DB2. This setting enables DB2 core dump files for problem 
description and problem source identification (PD/PSI) for IBM Smart 
Analytics System Support services. You can verify this parameter setting by 
running the following command: 
cat /proc/sys/kernel /sui d dumpabl e 
kernel. randomize_va_space=0 

This kernel parameter disables the Linux address space randomization. 

You need to disable this feature because it can cause errors with the DB2 

backup utility or log archival process. You can verify this parameter setting by 

running the following command: 

cat /proc/sys/kernel /randomize_va_space 

vm.swappiness=0 

This parameter determines the kernel preference for using swap space 
versus RAM. When set to the minimal value 0, it means that swap space 
usage is not favored at all by the kernel. With this configuration, in general, 
the kernel will delay swapping until it becomes necessary. You can verify this 
parameter setting by running the following command: 
cat /proc/sys/vm/swappiness 
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► vm.dirty_background_ratio=5 and vm.dirty_ratio=10 

The vm.dirtybackgroundratio parameter represents the percentage of dirty 
pages resulting from I/O write operations which triggers a background flush of 
the pages. This parameter works in conjunction with vm.dirty_ratio. 

The vm.dirty ratio parameter represents the percentage of dirty pages ratio 
in memory resulting from I/O write operations before they are being forced to 
flush on the system, causing I/O writes to be blocked till the flush completes. 
These settings help in limiting dirty page caching. On the 5600 environments, 
DB2 does not use file system caching for table spaces and transaction log 
files, and is not impacted by this setting. 

All the kernel settings listed previously can be updated in the /etc/sysctl.conf file 
on each node on the cluster. Example 7-16 shows an example of /etc/sysctl.conf 
from an IBM Smart Analytics System 5600 VI environment. 

Example 7-16 sysctl.conf 

# Disable response to broadcasts. 

# You don't want yourself becoming a Smurf amplifier. 
net.ipv4.icmp_echo_ignore_broadcasts = 1 

# enable route verification on all interfaces 
net.ipv4.conf.all .rp_filter = 1 

# enable ipV6 forwarding 
#net.ipv6.conf. all .forwarding = 1 
kernel .msgmni =16384 

kernel. sem=250 256000 32 32768 
vm.swappiness=0 
vm.dirty_ratio=10 
vm.dirty_background_ratio=5 
kernel .suid_dumpable=l 
kernel .randomize_va_space=0 


After the /etc/sysctl.conf file has been updated, run the following command to 
load these kernel parameters and make them effective at next reboot: 

sysctl -p 
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User limits 

The IBM Smart Analytics System 5600 VI and V2 use the default ul imit 
parameters, with the exception of nofiles, which is set explicitly to 65536. 
Example 7-17 shows the default parameters in a 5600 VI environment. 

Example 7- 1 7 User limits of 5600 VI 

# ulimit -a 


core file size 

(blocks, 

-c) 

0 

data seg size 

(kbytes, 

-d) 

uni imited 

file size 

(blocks, 

-f) 

uni imited 

pending signals 

( 

-i) 

540672 

max locked memory 

(kbytes, 

-1) 

32 

max memory size 

(kbytes, 

-m) 

uni imited 

open files 

( 

-n) 

65536 

pipe size 

(512 bytes, 

-P) 

8 

POSIX message queues (bytes, 

-q) 

819200 

stack size 

(kbytes, 

-s) 

8192 

cpu time 

(seconds, 

-t) 

uni imited 

max user processes 

( 

-U) 

540672 

virtual memory 

(kbytes, 

-V) 

uni imited 

file locks 

( 

-X) 

uni imited 


7.1.2 DB2 configuration 

In this section, we provide a summary of the DB2 instance and database 
configuration for the IBM Smart Analytics System 5600 VIA/2, 7600, and 7700. 
For detailed information about any parameter in this section, consult the DB2 9.7 
Information Center at the following link: 

http://publib.boulder.ibm.com/infocenter/db21uw/v9r7/index.jsp 
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Configuration and design: The DB2 configuration and design choices on the 
IBM Smart Analytics System result from a thorough performance validation 
and testing, and provide a strong performance for the hardware specifications 
of the appliance. Also, this configuration integrates the best practices in the 
field for business intelligence workloads. 

Design choices such as the file system layout and the number of logical 
database partitions per physical partition, must not be modified, because 
doing so constitutes a major deviation from the prescribed configuration. 

Configuration settings closely tied to the hardware specifications of the 
appliance (DB2_PARALLEL_IO registry variable, or NUMJOSERVERS, for 
example) must not be modified without consulting your IBM Smart Analytics 
System support. Changes to such configuration settings can result in a 
performance degradation. 

Parameters related to best practices in the field, such as optimizer related 
registry variable settings, provide you strong performance for the analytical 
query workload. If there is evidence that these parameters are not beneficial 
for your specific query workload or environment, you can disable them. 

DB2 memory related parameters, such as buffer pool sizing or sort heap listed 
for IBM Smart Analytics System offerings, constitute a good starting point for 
most business intelligence workloads. However, these parameters can be 
subject to further tuning and adjustments, depending on your particular 
workload. 

Overall, parameters in the DB2 configuration are a good starting point for most 
workloads. However, further adjustments and tuning might be required 
depending on your specific requirements. 

It is critical to understand the impact and thoroughly test the effect of any 
configuration changes that you plan to make on the environment: 

► Only consider making configuration changes to address performance 
bottlenecks, and only if there is an expected benefit. 

► When tuning DB2, proceed with one change at a time in order to keep track 
and understand the impact of each change. 

Configuration changes considerably affect the DB2 engine behavior, such as 
turning INTRA_PARALLEL ON, are not desirable. In case of doubt, consult 
with IBM Smart Analytics System support. 
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Registry variables 

Table 7-2 shows a summary of all registry variables settings for the IBM Smart 
Analytics System 5600 VIA/2, 7600, and 7700 environments. Next, we discuss 
the registry variables affecting your performance directly. 


Table 7-2 DB2 registry variables 


Registry variable 

| Linux environment 

AIX environment J 

5600 VI 

5600 V2 

7600 

7700 

DB2_EXTENDED_OPTIMIZATION 

ON 

ON 

ON 

ON 

DB2_ANTIJOIN 

YES 

EXTEND 

ON (DB2 9.5) 
EXTEND 
(DB2 9.7) 

EXTEND 

DB2COMM 

TCPIP 

TCPIP 

TCPIP 

TCPIP 

DB2_PARALLEL_IO 

*:5 

*:5 

*:8 


DB2RSHCMD 

/usr/bin/ssh 

/usr/bin/ssh 

/usr/bin/ssh 

/usr/bin/ssh 


DB2_EXTENDED_OPTIMIZATION 

With the setting enabled, the optimizer uses optimization extensions. It is best 
practice to enable this setting for analytical query workloads, and is proven to 
provide strong performance for most business intelligence workloads. 

DB2_A NT! JOIN 

► DB2_ANTIJOIN=YES causes the optimizer to search for opportunities to 
transform subqueries of a NOT IN clause into an antijoin that can be 
processed more efficiently. 

► DB2_ANTIJOIN=EXTEND causes the optimizer to equally consider 
subqueries of a NOT EXISTS clause. This parameter can benefit most 
queries with a NOT IN clause and a NOT EXISTS clause. You can identify all 
the queries in your environment using these clauses and validate the benefits 
of this variable for your specific environment. 

DB2_PARALLEL_IO 

The DB2_PARALLEL_IO setting determines the prefetching parallelism that 
corresponds to the number of parallel prefetch requests to satisfy your table 
space prefetch size. 

Recent Linux and AIX environments use automatic storage with a single storage 
path, so all table spaces have a single container. The single container is located 
on a redundant array of independent disks (RAID) array with multiple disk 
spindles. This registry variable needs to be enabled to benefit from the 
prefetching parallelism and leverage all disk spindles available on each LUN. 
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The following settings determine how sequential prefetching is performed in your 
environment: 


► DB2_PARALLEL_IO setting: n:d 

- n is the table space ID where the parallelism is to be applied. All IBM 
Smart Analytics System settings use the wildcard to specify all table 
spaces. 

- d represents the number of disks for the containers for the specific table 
space. The best practices is to set this value either to the number of active 
spindles or total spindles, depending on your RAID level and OS. 

► Table space EXTENT size: Each individual prefetch request will be the size of 
an extent. 

► Table space PREFETCH size: The prefetch size represents the number of 
pages requested at a time for a specific table space. Most IBM Smart 
Analytics System environments use a prefetch size of AUTOMATIC, except 
for the IBM Smart Analytics System 7700. When prefetch size is set to 
AUTOMATIC, for this specific environment with a single container per table 
space, the prefetch size is determined as follows: 

Prefetch size (Pages) = DB2_PARALLEL_I0 disk setting x extent size 
(Pages) 

For example, based on Table 7-3, we have for 5600 VI : 

- PREFETCHSIZE=AUTOMATIC 

- DB2_PARALLEL_IO=*:5 

- Table space extent size=32 

The prefetch size is computed to 32 pages x 5 = 160 pages 
The 7700 environment does not use the AUTOMATIC setting for the 
PREFETCHSIZE, and uses a fixed prefetch size of 384 instead which 
provides a good performance in that environment. 

► Database configuration NUMJOSERVERS: Most IBM Smart Analytics 
System environments use the AUTOMATIC setting. NUMJOSERVERS in 
this case is determined by the following formula: 

NUMJOSERVERS = MAX(number of containers in the same stripe set from 
all your tablespaces) x DB2_PARALLEL_I0 disk setting 

Because IBM Smart Analytics System uses a single container per table 
space, NUMJOSERVERS will be equal to the DB2_PARALLEL_IO disk 
setting. 7700 uses a fixed NUMJOSERVERS of 12. 

► Buffer pool setting: Linux based environments uses vectored I/O to perform 
sequential prefetching, as this provides good performance in Linux 
environments. AIX based environments uses block-based I/O with a block 
area buffer pool. The block size is equal to the extent size. 
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Table 7-3 shows a summary of the number of parallel prefetch requests, as well 
as the total prefetch request size, depending on your environment: 


Table 7-3 Summary of the number of parallel prefetch request and the total prefetch request size 


Environment 

RAID array and 
segment size 

EXTENT 

size 

(pages) 

PREFETCH 

SIZE 

(pages) 

NUM 

IOSERVERS 

DB2 PARA 
LLELJO 

Number of 
parallel 
prefetch 
requests 

Total 
prefetch 
size (KB) 

5600 VI and 
5600 V2 

RAID-6 
(4+P+Q) 
segment 128K 

32 

AUTO(160) 

AUTO(5) 

*:5 

5 

2560 

7600 

RAID-5 
(7+P) segment 
256K 

16 

AUTO(128) 

AUTO(8) 

*:8 

8 

2048 

7700 

RAID-6 
(10+P+Q), 
segment 256K 

16 

384 

12 


24 

6144 


The extent size is a multiple of the RAID segment size for all IBM Smart Analytics 
System offerings to get I/O alignment, and optimize table space access. In IBM 
Smart Analytics System 5600 environments, the RAID segment size is set to 
128K, with an extent size set to 51 2K. Each extent is striped onto four disks, 
which is corresponding to the number of active disks in the array. On the IBM 
Smart Analytics System 7600 and 7700 environments, each extent is equal to 
the segment size of 256K. So, each extent is on one disk. 

For all environments except the IBM Smart Analytics System 7700, the prefetch 
size is equal to the EXTENT size times the DB2_PARALLEL_IO disk setting. 
DB2 will satisfy the prefetching request by running a number of prefetch requests 
in parallel equal to the DB2_PARALLEL_IO disk setting. Each of these prefetch 
requests is of one extent, assigned each to a DB2 prefetcher. For example on the 
IBM Smart Analytics System 7600, DB2 will assign eight prefetch requests, of 
one extent each, to each prefetcher. 

For the IBM Smart Analytics System 7700 environment, we have a fixed number 
of prefetchers 12 (matching the number of spindles in the RAID array). The 
prefetch size is also fixed at 384 pages. Because the prefetch size is not 
AUTOMATIC, DB2 computes the prefetching parallelism degree by dividing the 
PREFETCH size by the extent size, which is equal to 24. DB2 satisfies the 
prefetch request by assigning in parallel 24 requests of one extent each to 12 
prefetchers. This results in assigning two extents request per prefetcher. This 
setting helps in achieving a more aggressive prefetching, which leverages the 
higher I/O bandwidth available with the 7700. 
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Based on performance testing, these settings provide a good I/O sequential 
throughput from the storage and are tied to the specific storage configuration and 
specifications. Do not change this parameter value. 


Database manager configuration 

All the database manager configuration settings are the default ones, except for 
the settings listed in Table 7-4. 


Table 7-4 Database manager configuration parameters 


DBM configuration 

CPUSPEED 

COMM_BANDWIDTH 

NUMDB 

DIAGPATH 


SHEAPTHRES 


FCM_NUM_BUFFERS 


2.36E-07 


2.36E-07 


100 


100 




2.70E-07 


/db2fs/bcuaix/ 

db2dump 


/db2fs/bcuaix/ 

db2dump 


/db2path/bcuaix 

/db2dump 


600000, 

1200000 with SSD 
option 


600000, 
1400000 wi 
option 


AUTOMATIC(131072) 


AUTOMATIC(131072) 


AUTOMATIC (8192) 


2.70E-07 


100 


/db2fs/bcuaix 

/db2dump 


1400000 


AUTOMATIC (16384) 


CPUSPEED and COMM_BANDWIDTH 

The database manager configuration parameters CPUSPEED, and 
COMM_BANDWIDTH are used by the DB2 Optimizer to compute the most 
optimal access plan. CPUSPEED helps the optimizer in estimating the CPU cost 
associated for low level operations taken during the query execution. 
COMM_BANDWIDTH helps the optimizer in evaluating the cost of performing 
internal communications between database partitions operations during query 
processing. 

The current settings shown for the IBM Smart Analytics System are a good 
baseline to reflect the system specifications to the DB2 Optimizer. Any change of 
these values requires a thorough testing of your entire query workload, as it 
might impact access plans. 

SHEAPTHRES 

IBM Smart Analytics System does not use shared sorts. Private sorts provide an 
optimal performance for sort intensive queries in this specific environment. 
SHEAPTHRES is set to cap the sum of private sort memory allocated in 
SORTHEAP concurrently for all agents connected to a logical database partition 
to perform private sorts. SORTHEAP is used for sorting, as well as hash joins 
processing. 


222 


IBM Smart Analytics System 






SHEAPTHRES value represents a cap per database logical partition. It is 
currently set to around 28% to 33% of the total RAM available per logical partition 
for the IBM Smart Analytics System 5600, and 7600/7700, which is relatively 
conservative. 

This parameter can be further tuned depending on the nature of your query 
workload (for example, sorts and hash joins), and its concurrency. This 
parameter can be tuned in conjunction with SORTHEAP. 

You can monitor the occurrences of post-threshold sorts using either the 
database snapshot monitoring, the db2top utility, or the DB2 New monitoring 
facility. Example 7-18 shows an example of a new monitoring query showing a 
post-threshold sort where SHEAPTHRES has been exceeded. This query shows 
occurrences of post threshold sorts for each application aggregated for all the 
partitions. 

Example 7- 1 8 Query shows post threshold sorts 

SELECT applicationjiandle AS appjiandle, 

SUM (total _sorts) AS sum_sorts, 

SUM (sort_overf lows) AS SUM_0VERFL0WS, 

SUM(post_threshold_sorts) AS sum_post_tresh_sort FROM 
TABLE (mon_get_connecti on (CAST (NULL AS bigint),-2)) 

GROUP BY applicationjiandle ORDER BY applicationjiandle 

APPJIANDLE SUM_S0RTS SUM_0VERFL0WS SUM_POST_TRESH_SORT 


76 0 

77 0 

78 0 

79 0 

80 0 

81 0 

82 0 

83 0 

84 0 

85 0 

86 0 

87 0 

94 0 


13 record(s) selected. 


3 

0 

2 

3 

0 

0 

0 

0 

8 

0 

8 

0 

0 


Example 7-19 shows occurrences of post threshold sorts and hash joins from the 
database manager global snapshot. 
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Example 7- 1 9 Using snapshot for the occurrences of post threshold sorts and hash joins 

# db2 get snapshot for dbm global | egrep -i "hash|sort" | grep -v "Sorting" 


Private Sort heap allocated = 43200 

Private Sort heap high water mark = 901680 

Post threshold sorts = 27 

Piped sorts requested = 85 

Piped sorts accepted = 85 

Hash joins after heap threshold exceeded = 48 


FCM_ NUM BUFFE RS 

The FCM_NUM_BUFFERS configuration parameter has been configured with 
an initial value that is a good starting point for most environment. For both 
Linux-based and AlX-based environments, DB2 preallocates additional shared 
memory to accommodate higher requirements and increase the resources 
dynamically during runtime, in a transparent manner for applications. However, if 
the requirement exceeds the additional memory reserved by DB2, you might still 
be exposed to running out of fast communications manager (FCM) resources. 

From a best practice prospective, you can adjust this value closer to your peak 
requirement and limit the automatic adjustments which can have a small impact 
on your overall system performance. You can monitor the FCM resources 
through the db2pd utility or the database manager snapshot and verify if the initial 
values are increased by DB2, or their current level of utilization. 

Example 7-20 shows an example of a db2pd -fcm command that was run on all 
partitions with a summary of the FCM usage per physical host. Note that the 
FCM resources are shared between logical partitions on a given physical 
partition. So, the usage summary information can just be collected from one 
logical partition per physical. 

Example 7-20 db2pd -fcm 

rah 'db2pd -fcm | egrep 

"Usage|==|Partition|Buffers: |Channels: |Sessions: |LWM:"|grep -v Status' 

Database Partition 0 — Active -- Up 0 days 00:13:39 — Date 09/24/2010 19:35:58 
FCM Usage Statistics 


Total Buffers: 131565 

Free Buffers: 131542 

Buffers LWM: 131287 

Max Buffers: 1573410 

Total Channels: 2685 
Free Channels: 2644 

Channels LWM: 2543 

Max Channels: 1573410 

Total Sessions: 895 
Free Sessions: 860 
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Sessions LWM: 850 

ISAS56R1D1: db2pd -fcm | egrep ... completed ok 

Database Partition 1 — Active — Up 0 days 00:13:41 — Date 09/24/2010 19:36:00 
FCM Usage Statistics 


Total Buffers: 526260 

Free Buffers: 526225 

Buffers LWM: 526144 

Max Buffers: 2097880 

Total Channels: 10740 
Free Channels: 10687 

Channels LWM: 10611 

Max Channels: 2097880 

Total Sessions: 3580 
Free Sessions: 3463 

Sessions LWM: 3425 

ISAS56R1D2: db2pd -fcm | egrep ... completed ok 

Database Partition 5 — Active — Up 0 days 00:13:40 — Date 09/24/2010 19:36:01 
FCM Usage Statistics 


Total Buffers: 526260 

Free Buffers: 526227 

Buffers LWM: 526124 

Max Buffers: 2097880 

Total Channels: 10740 
Free Channels: 10688 

Channels LWM: 10612 

Max Channels: 2097880 

Total Sessions: 3580 
Free Sessions: 3464 

Sessions LWM: 3427 

ISAS56R1D3: db2pd -fcm | egrep ... completed ok 


In the output shown in Example 7-20 on page 224, we can see that the total 
number of buffers (526260) is below the FCM_NUM_BUFFERS initial 
configuration, which is 524288 logical database partitions (131072 * 4 = 524288). 
So, no automatic adjustment has occurred. The low watermark (LWM) for the 
buffers and the channels are also very close to the total, so we have not been 
any closer to running out of FCM resources. Because this parameter is set to 
AUTOMATIC, DB2 can increase or decrease this value depending on the 
workload requirements on the system. 

This information is also available through the MON_GET_FCM relational 
monitoring function available starting with DB2 9.7 Fix Pack 2. 

Example 7-21 shows an example of MON_GET_FCM usage that allows you to 
monitor the low watermark (bottom) for buffers and channels for each database 
partition (member). 
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Example 7-21 Using MON_GET_FCM function 


# db2 "select substr(hostname,l,10) as hostname, member, bufftotal , 
buff_free_bottom, chtotal , ch_free_bottom from table (mon_get_fcm(-2)) order 
by member" 


HOSTNAME MEMBER BUFF_TOTAL BUFF_FREE_BOTTOM CH_T0TAL 


ISAS56R1D1 0 131565 
ISAS56R1D2 1 526260 
ISAS56R1D2 Z 526260 
ISAS56R1D2 3 526260 
ISAS56R1D2 4 526260 
ISAS56R1D3 5 526260 
ISAS56R1D3 6 526260 
ISAS56R1D3 7 526260 
ISAS56R1D3 8 526260 


9 record(s) selected. 


131303 2685 2648 
526193 10740 10631 
526193 10740 10631 
526193 10740 10631 
526193 10740 10631 
526187 10740 10632 
526187 10740 10632 
526187 10740 10632 
526187 10740 10632 


On a well balanced environment with no data skew, particular queries with an 
inefficient access plan can also cause a sudden increase of FCM resource 
usage. To obtain detailed information about FCM resources usage per 
application, and identify the application consuming a high amount of FCM 
resources, you can use the db2pd -fern full output, or the 
MON_GET_CONNECTION output. 

Example 7-22 shows an output of db2pd -fern. The FCM buffers and channels 
resource usage is low and well balanced between applications, with no 
application having an outstanding usage. 

Example 7-22 db2pd -fmc output 

db2pd -fem 

Database Partition 1 — Active — Up 0 days 00:02:38 — Date 09/24/2010 20:21:21 
FCM Usage Statistics 


Total Buffers: 526260 

Free Buffers: 526212 

Buffers LWM: 526145 

Max Buffers: 2097880 

Total Channels: 10740 
Free Channels: 10655 

Channels LWM: 10655 

Max Channels: 2097880 

Total Sessions: 3580 
Free Sessions: 3451 

Sessions LWM: 3451 
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Parti tio 

Bufs Sent Bufs Recv 

Status 

0 

415734 99 

Active 

1 

115 115 

Active 

2 

1 1 

Active 

3 

1 1 

Active 

4 

1 1 

Active 

5 

1 1 

Active 

6 

1 1 

Active 

7 

1 1 

Active 

8 

1 1 

Active 

Buffers Current Consumption 


AppHandl 

[nod-index] TimeStamp 

Buffers In-use 

0 

[000-00000] 0 

16 

76 

[000-00076] 3392975566 

6 

80 

[000-00080] 3392975567 

5 

75 

[000-00075] 3392975564 

5 

78 

[000-00078] 3392975567 

4 

81 

[000-00081] 3392975568 

4 

79 

[000-00079] 3392975567 

4 

77 

[000-00077] 3392975566 

4 

Channels 

Current Consumption 


AppHandl 

[nod-index] TimeStamp 

Channels In-use 

0 

[000-00000] 0 

16 

77 

[000-00077] 3392975566 

8 

76 

[000-00076] 3392975566 

8 

75 

[000-00075] 3392975564 

8 

78 

[000-00078] 3392975567 

8 

79 

[000-00079] 3392975567 

8 

80 

[000-00080] 3392975567 

8 

81 

[000-00081] 3392975568 

8 

65601586 

[1001-00050] 0 

4 

65546 

[001-00010] 0 

2 

131082 

[002-00010] 0 

2 

262154 

[004-00010] 0 

2 

196618 

[003-00010] 0 

2 

65587 

[001-00051] 3392975446 

1 

Buffers Consumption HWM 


AppHandl 

[nod-index] TimeStamp 

Buffers Used 

Channels 

Consumption HWM 


AppHandl 

[nod-index] TimeStamp 

Channels Used 
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Database configuration settings 

In this section, we present the database configuration settings for the IBM Smart 
Analytics System 5600 VIA/2, and 7600/7700 environments. We then discuss 
further the parameters that have an impact on performance. 

Table 7-5 contains the DB2 database configuration parameter settings set by 
default on these environments. 


Table 7-5 DB2 database configuration parameters 


Configuration 

parameter 

5600 VI 

5600 V2 

7600 

7700 

LOCKLIST 

16384 

16384 

16384 

16384 

MAXLOCKS 

10 

10 

10 

10 

PCKCACHESZ 

-1 

-1 

-1 

-1 

SORTHEAP 

12000 

12000, 
35000 with 
SSD option 

20000 

35000 

LOGBUFSZ 

2048 

2048 

2048 

2048 

UTIL_HEAP_SZ 

65536 

65536 

65536 

65536 

STMTHEAP 

10000 

10000 

10000 

10000 

LOGFILSIZ 

12800 

12800 

12800 

12800 

LOGPRIMARY 

50 

50 

50 

50 

LOGSECOND 

0 

0 

0 

0 

NEWLOGPATH 

/db2fs/ 

bculinux 

/db2plog/ 

bculinux 

/db2path/ 

bcuaix 

/db2plog/ 

bcuaix 

MIRRORLOGPATH 

Not 

available 

/db2mlog/ 

bculinux 

Not 

available 

/db2mlog/ 

bcuaix 

CHNGPGS_THRESH 

Default 

(60) 

Default (60), 30 
with SSD 
option 

Default 

(60) 

30 

WLM_COLLECT_INI 

T 

Not set 

Not set 

20 

20 

DFT_PREFETCH_SZ 

AUTO 

AUTO 

AUTO 

384 

NUM_IO_SERVERS 

AUTO(5) 

AUTO(5) 

AUTO(8) 

12 
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Configuration 

parameter 

5600 VI 

5600 V2 

7600 

7700 

NUM_IO_CLEANERS 

AUTO(7) 

AUTO(3) 
Explicitly set to 
3 on the 
administration 
node 

AUTO(1) 

AUTO(7) 
Explicitly set to 
7 on the 
administration 
node 


LOCKLIST and MAXLOCKS 

LOCKLIST represents the amount of memory in the database shared memory 
set that is used to store the locks for all applications connected to the database. It 
is currently set to 16384. 

A high value for LOCKLIST can result in performance degradation associated 
with the traversal of the lock list by each application each time they request a 
lock. A value too low might result in premature lock escalations which can hurt 
the concurrency of the applications in the system. The IBM Smart Analytics 
System User Guide corresponding to your configuration contains detailed 
information about the sizing of the LOCKLIST. The LOCKLIST value set initially 
provides a good starting point. 

You can use the following db2pd command to dump the contents of your locklist 
for a particular database partition: 
db2pd -db bcukit -locks 

MAXLOCKS is the percentage of the locklist contents that can be held by a 
single application before a lock escalation occurs, which consists in converting 
multiple row level locks on the same table into a single table level lock. This 
setting will result in saving space in the locklist. 

Another important locking parameter that is set by default is LOCKTIMEOUT 
which is set to -1 . This setting means that applications in lock-wait status will be 
waiting indefinitely for a lock, instead of timing out. This setting might not be 
appropriate for all environments and might need to be adjusted based on your 
specific applications behavior. 

If your applications are experiencing locking issues (deadlocks or lock timeouts), 
it is necessary to identify: 

► The various applications involved 

► The SQL statements from the conflicting applications 

► The database objects on which locking issues are occurring 

► The nature and duration of the locks 
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This information allows you to understand the scenario involving the applications 
and utilities in conflict. Database snapshot or relational monitoring function 
SNAP_GET_DB provides a high level overview of deadlocks, and lock timeouts 
occurring in your database. 

Example 7-23 shows an example of database snapshot excerpt with database 
level lock information. 

Example 7-23 Snapshot of lock information 


# db2_all 'db2 "get snapshot for database on bcukit" | grep -i lock | egrep -vi 
" MDC|Pool" ' 


Locks held currently = 21 

Lock waits = 5 

Time database waited on locks (ms) = 953 

Lock list memory in use (Bytes) = 31872 

Deadlocks detected = 0 

Lock escalations = 0 

Exclusive lock escalations = 0 

Agents currently waiting on locks = 0 

Lock Timeouts = 0 

Internal rollbacks due to deadlock = 0 

ISAS56R1D1: db2 "get snapshot ... completed ok 

Locks held currently = 222 

Lock waits = 1 

Time database waited on locks (ms) = 5 

Lock list memory in use (Bytes) = 70656 

Deadlocks detected = 0 

Lock escalations = 0 

Exclusive lock escalations = 0 

Agents currently waiting on locks = 0 

Lock Timeouts = 3 

Internal rollbacks due to deadlock = 0 

ISAS56R1D2: db2 "get snapshot ... completed ok 


Example 7-24 shows an example of the SNAP_GET_DB query to monitor locks 
usage. Note that the information returned is aggregated for all partitions. 

Example 7-24 Monitor locks using SNAP_GET_DB 

db2 "select APPLS_CUR_C0NS, L0CKS_HELD, L0CK_WAITS, DEADLOCKS, L0CK_ESCALS, 
L0CK_TIME0UTS from TABLE (SNAP_GET_DB ( 1 BCUKIT 1 , -2) ) " 


A few monitoring tools are available with DB2 to further drill down the scenario of 
the lock escalation, or deadlocks. Consult the DB2 9.7 Information Center for 
more details about each of these options: 

► db2pd utility: The DB2 9.7 Information Center contains detailed information 
about how to diagnose locking issues with db2pd: 
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http : //publ i b. boul der . i bm.com/infocenter/db21 uw/v9r7/topi c/com. i bm.d 
b2.1 uw. admin. trb.doc/doc/c0054595.html 

► Lock and application snapshots. 

► Relational monitoring functions: Provide aggregated information for all 
database partitions: 

- MON_GET_APPL_LOCKWAIT: Collects all locks information about what 
application is waiting for. 

- MON_GET_LOCKS: List all locks currently acquired for the applications 
connected to the database. The advantage of using this method is that you 
can specify search arguments to select locks for a particular table. 

- MON_FORMAT_LOCK_NAME: Can be used to format the binary name of 
a lock into a human readable format. 

- MON_GET_CONNECTION: Can give you locking information per 
application handle. 

- MON_LOCKWAITS: Useful administrative view which lists all applications 
on lockwait mode, along with lock details, and the application holding the 
lock. 

For locking issues, it is essential to capture either information at the exact time 
the lock waits occur or historical information about the lock waits. It might be 
difficult to reproduce a scenario at will and collect diagnostics. The following 
ways can be used to get historical information or trigger diagnostic data 
collection when the problem is occurring: 

► Event monitor: 

DB2 9.7 provides a new event monitor for locking: 

CREATE EVENT MONITOR ... FOR LOCKING 

This new event monitor contains information about all locking related events 
including deadlocks and lock timeouts. Previously, the only event monitor 
available for locking was limited to monitor deadlocks occurrences. This event 
monitor provides more exhaustive information for both lock timeouts and 
deadlocks. 

► DB2 callout script: 

Another advanced option to understand the exact scenario leading to a 
deadlock or lock time is to enable a DB2 callout script (db2pdcfg setting to 
catch the error and trigger the diagnostic data collection and db2cos for the 
actual collection) to collect a customizable set of diagnostics at the exact time 
the deadlock or lock timeout occurs. This option is generally used by IBM 
Smart Analytics System support services to further narrow down complex or 
unclear locking scenarios. 
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PCKCACHESZ 

IBM Smart Analytics System sets the package cache to -1 , which corresponds to 
(MAXAPPLS*8). MAXAPPLS is set by default to AUTOMATIC, which can be 
adjusted by DB2 to allow more connections. 

Example 7-25 shows how to check for your actual package cache value in effect 
for all your nodes. 

Example 7-25 Check PCKCACHESZ value 

db2_all 'db2pd -db bcukit -dbcfg | egrep -i "value |pckcachesz"|grep -vi freq' 

Description Memory Value Disk Value 

PCKCACHESZ (4KB) 320 320 

ISAS56R1D1: db2pd -db bcukit -dbcfg ... completed ok 

Description Memory Value Disk Value 

PCKCACHESZ (4KB) 320 320 

ISAS56R1D2: db2pd -db bcukit -dbcfg ... completed ok 


In order to check to see if this value is sufficient for your environment, you need 
to check for package cache overflows. This information can be found either with 
db2pd with -dynamic or -static flag, or a database snapshot. SNAP_GET_DB 
can provide you a high level overview for any package cache overflows. 

Example 7-26 shows a usage example of SNAP_GET_DB on how to capture 
this information along with the output. 

Example 7-26 Check package cache information 

db2 "select PKG_CACHE_L00KUPS, PKG_CACHE_INSERTS, PKG_CACHE_NUM_0VERFL0WS, 
PKG_CACHE_SIZE_TOP from TABLE (SNAP_GET_DB(' BCUKIT' , -2))" 

PKG_CACHE_L00KUPS PKG_CACHE_INSERTS PKG_CACHE_NUM_0VERFL0WS PKG_CACHE_SIZE_TOP 


23 15 0 488697 

1 record(s) selected. 


In case of overflows or if the package cache high watermark size 
(PKG_CACHE_SIZE_TOP) is close to the PCKCACHESZ (after converting in 
bytes), you can increase your package cache. 

You can also monitor the number of package cache inserts to identify an unusual 
pattern in package cache usage. In order to narrow down the application 
performing most of the package cache inserts, the relational monitoring function 
MON_GET_CONNECTION can provide you this information per application. 
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The value set by default with IBM Smart Analytics System is fairly conservative. 
You might need to increase it depending on your workload. 

SORTHEAP 

The IBM Smart Analytics System uses private sorts. The ratio of SHEAPTHRES 
divided by SORTHEAP determines the number of concurrent sorts supported 
before hitting the cap set by SHEAPTHRES. When the sum of the private sort 
allocations on each partition is close to SHEAPTHRES, DB2 starts limiting the 
amount of sorts allocations by allowing smaller allocations to the various 
applications to remain within the cap. 

Table 7-6 shows a summary of the default sort concurrency level for particular 
IBM Smart Analytics System environments. This concurrency level represents 
the theoretical concurrent sort requests which can be ran before sort requests 
start being capped by the DB2 Database Manager. As explained next, these 
values can be adjusted to meet your specific needs. 


Table 7-6 Summary of default concurrency level 


Environment 

SHEAPTHRES 

SORTHEAP 

Sort concurrency level 

5600 VIA/2 

600000 

12000 

50 

5600 VI with SSD 

1200000 

12000 

100 

5600 V2 with SSD 

1400000 

35000 

40 

7600 

600000 

20000 

30 

7700 

1400000 

35000 

40 


The IBM Smart Analytics System provides initial values for SHEAPTHRES and 
SORTHEAP which are a good starting point for most analytical query workloads, 
which are sort and hash join intensive. However, you can adjust these settings 
depending on your specific environment: 

► If there are occurrences of post threshold sorts and hash joins (see 
“SHEAPTHRES” on page 222 for further details), you can try to decrease 
SORTHEAP to allow for more concurrency. 

► You can monitor occurrences of sort and hash joins overflows. Overflows can 
happen with large amounts of data processed in IBM Smart Analytics System 
environments and might not necessarily be a problem. If you see: 

- An increase on the number of overflows. 

- The ratio of sort overflows on total number of sorts is increasing or is high. 

- The ratio of hash joins overflows on total number of hash joins is 
increasing or is high. 
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You can consider increasing SORTHEAP. However, you might see an increase of 
post threshold sorts or hash joins. Depending on the memory available on the 
system, you can still increase SHEAPTHRES proportionally to maintain the 
same level of concurrency in that case. 

Example 7-27 shows how to get a global aggregated view of how many sort 
overflows and hash join overflows are occurring cluster wide. You can use the 
method shown in Example 7-18 on page 223 to further narrow down the 
applications performing the sorts. 

Example 7-27 Aggregate view of sort and hash join overflows 

SELECT sort_heap_al located, total_sorts, total _sort_ti me, sort_overflows, 
hashjoi n_overf 1 ows , acti ve_sorts 
FROM TABLE (SNAP_GET_DB ( ' BCUKIT ' , -2) ) 

SORT_HEAP_ALLOCATED T0TAL_S0RTS T0TAL_S0RT_TIME 


496865 59 2380... 

S0RT_0VERFL0WS HASH_J0IN_0VERFL0WS ACTIVE_S0RTS 


1 107 20 

1 record(s) selected. 


In order to narrow down the statements performing the sorts or the hash joins, 
use the following methods: 

► Application snapshots provide a drill down of sort and hash joins activity per 
application. The snapshots also provide information about the SQL being 
executed. 

► MON_GET_CONNECTION can be used to obtain application level sort 
activity. 

► The M O N_G ET_P KG_C AC H E_STMT relational monitoring function can also 
be used to obtain statement level detailed metrics on sort processing. 

Example 7-28 shows an example of MON_GET_PKG_CACHE_STMT usage to 
display sort summary information. 

Example 7-28 Display sort summary information using MON_GET_PKG_CA CHE_S TMT 

SELECT VARCHAR (SUBSTR (STMT_TEXT , 1 , 50) ) AS STMT, 

MEMBER, T0TAL_S0RTS, S0RT_0VERFL0WS, P0ST_THRESH0LD_S0RTS 
FROM TABLE (MON_GET_PKG_CACHE_STMT ( ' D ' , NULL , NULL, -2) ) 

WHERE T0TAL_S0RTS > 0 ORDER BY STMT, MEMBER; 

STMT MEMBER T0TAL_S0RTS 
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SELECT VARCHAR(SUBSTR(STMT_TEXT, 1,100)) AS STMT, ME 0 


select 

c_custkey, c_ 

ame, sum(l_extendedprice * 

3 

select 

l_orderkey, s 

m(l_extendedpri 

ce Ml - 1 

1 

select 

l_orderkey, s 

m(l_extendedpri 

ce Ml - 1 

2 

select 

l_orderkey, s 

m(l_extendedpri 

ce Ml - 1 

3 

select 

l_orderkey, s 

m(l_extendedpri 

ce Ml - 1_ 

4 

select 

l_orderkey, s 

m(l_extendedpri 

ce Ml - 1_ 

5 

select 

l_orderkey, s 

m(l_extendedpri 

ce Ml - 1_ 

6 

select 

l_orderkey, s 

m(l_extendedpri 

ce Ml - 1 

7 

select 

l_orderkey, s 

m(l_extendedpri 

ce Ml - 1 

8 

select 

l_returnflag. 

1_1 inestatus, sum(l_quanti 

1 

select 

l_returnfl ag. 

T_1 inestatus, 

sum(l_quanti 

2 

select 

l_returnfl ag. 

1_1 inestatus, 

sum(l_quanti 

3 

select 

l_returnfl ag. 

1_1 inestatus, 

sum(l_quanti 

4 

select 

l_returnflag. 

1_1 inestatus, sum(l_quanti 

5 

select 

l_returnflag. 

11 inestatus, sum(l_quanti 

6 

select 

l_returnflag. 

1_1 inestatus, sum(l_quanti 

7 

select 

l_returnfl ag. 

1_1 inestatus, 

sum(l_quanti 

8 


. . .S0RT_0VERFL0WS P0ST_THRESH0LD_S0RTS 


LOGBUFSZ 

LOGBUFSZ represents the size of the internal buffer used by DB2 logger to store 
transaction log records. The default DB2 value of 256 is quite small for IBM 
Smart Analytics System environments, and has been increased to a higher value 
of 2048. A higher value is necessary to ensure good performance during LOAD 
of multidimensional clustering (MDC) tables, as additional logging is performance 
for the MDC block maintenance. This value is sufficient for most workloads. 
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UTIL_ HE A P_ SZ 

The utility heap is used by DB2 utilities such as BACKUP, RESTORE, or LOAD 
for allocating buffers. These utilities by default tune their heap usage themselves 
and adjust their individual heap usage based on the amount of memory available 
in UTIL_HEAP_SZ. 

Allocate sufficient space to the UTIL_HEAP_SZ so that these utilities perform 
well. The value has been increased to 65536 from the default value and is 
sufficient for most workloads. See the IBM Smart Analytics System User’s Guide 
for your respective version, for a detailed discussion about UTIL_HEAP_SZ 
sizing. 

CHNGPGS_ THRESH 

CHNGPGS_THRESH represents the percentage of dirty pages in the buffer 
pool, at which page cleaners are triggered to flush these pages. In order to limit 
the overhead associated with handling large dirty list pages, the value has been 
reduced for the 7700 because it has a large buffer pool. Lowering the 
CHNGPGS_THRESH helps in triggering page cleaners earlier, and more 
proactively. This approach helps while running write intensive workloads and 
specific utilities or operations such as REORG, or CREATE INDEX. 

DFT_PREFETCH_SZ and NUMJOSERVERS 

These parameters are related to prefetching. See “DB2_PARALLEL_IO” on 
page 219 for further discussion about these parameters. 


7.1.3 DB2 buffer pool and table spaces 

In the previous section, we discussed parameter configurations that are preset 
on the IBM Smart Analytics System. In this section, we discuss database objects 
such as the DB2 buffer pool and table spaces created on the IBM Smart 
Analytics System. 

DB2 buffer pool 

The default page size value chosen for all IBM Smart Analytics System is 16K. 
All IBM Smart Analytics System offerings have two buffer pools: 

► Default IBMDEFAULTBP buffer pool with a size of 1000 pages for the catalog 
tables. 

► BP1 6K: A large unified buffer pool for permanent and temporary table spaces. 

Table 7-7 shows the buffer pool sizes and block area sizes for the various IBM 
Smart Analytics offerings. 
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Table 7-7 Buffer pool and block area sizes 


BP16K buffer pool 

5600 VI 
(with SSD 
option 

5600 V2 
(with SSD 
option) 

7600 

7700 

Size (16k pages) 

179200 

(358400) 

179200 

(300000) 

160000 

300000 

Size (GB) 

2.73 

(5.47) 

2.73 

(4.58) 

2.44 

4.58 

Block area size (1 6k pages) 

N/A 

N/A 

16000 

100000 

Block area size (GB) 

N/A 

N/A 

0.24 

1.53 


One main difference between Linux-based and AlX-based offerings is the use of 
block areas. For analytical workloads, performance testing has shown that block 
I/O provides strong performance on the AIX platform, so the buffer pool has a 
dedicated block area for this purpose. Vectored I/O used by default for 
prefetching provides strong performance on the Linux platform. 

The IBM Smart Analytics System 7700 buffer pool is larger than the buffer pool of 
IBM Smart Analytics System 7600, and needs a larger block area given a more 
aggressive prefetching, and higher I/O bandwidth available. 

The IBM Smart Analytics System family uses an approach with a large unified 
buffer pool as a starting point. Managing a single buffer pool provides good 
performance in most cases. Results can vary depending on your actual 
workload. You might also have particular requirements in terms of page sizes (for 
example, need of 32K pages for large rows). You might then need to reduce the 
BP16K buffer pool and create additional buffer pools accordingly. 

Buffer pool snapshot can be used for detailed metrics about buffer pool activity. A 
key metric to monitor the buffer pools is the buffer pool hit ratio. You can use the 
MON_GET_BUFFERPOOL table function to obtain these metrics buffer pool hit 
ratio data for all the nodes in the cluster. Buffer pool metrics are discussed in 
“DB2 I/O metrics” on page 174. 

DB2 table spaces 

In this section, we discuss aspects of the DB2 table space design for the IBM 
Smart Analytics System, as well as guidelines to create table spaces. 

Regular table spaces 

Recent IBM Smart Analytics System use automatic storage by default. The 
storage path is pointing on each platform to the file system and LUNs designed 
for table space data. 
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Table spaces not using automatic storage also need to have containers placed 
under the file systems designed for table space data. Create a single container 
per table space. For platforms where NUMJOSERVERS is set to automatic, 
NUMJOSERVERS is determined as: 

MAX(number of containers in the same stripe set from all your 
tablespaces) x DB2_PARALLEL_I0 disk setting 

Note that, based on the previous formula, if you add any container to an existing 
table space or create a table space with multiple containers, these additional 
containers will be added to the same stripe set by default and will result in 
increasing the number of prefetchers. For example, in a 7600 configuration, 
which has two containers per table space, you might see 16 DB2 prefetchers per 
database partition on a 7600. The number of prefetchers will impact the 
prefetching performance on your system. 

All table spaces are created as database managed space (DMS) table spaces 
with the default NO FILE SYSTEM CACHING, enabling the use of direct I/O 
(DIO) and concurrent I/O (CIO). DIO allows to bypass file system caching, and 
copies the data directly from the disk to the buffer pool. DIO provides strong 
performance for DMS table spaces. It eliminates the overhead of looking up the 
data in the file system cache. It also eliminates the cost of copying the data twice, 
the first time from the disk to the file system cache, and the second time from the 
file system cache to the buffer pool. 

Concurrent I/O optimizes concurrent access to DMS container files. By default, 
JFS2 uses file level exclusive i-node locking mechanism to serialize concurrent 
write access, which impacts the performance of multiple DB2 threads trying to 
read and write data concurrently to the same single DMS container file. 
Concurrent I/O does not perform exclusive I-node locking systematically for all 
writes, but only on specific cases, allowing a greater level of concurrent access. 
Note that the use of DIO is implicit when using CIO. 

Table 7-8 contains a summary of the table spaces parameters for various IBM 
Smart Analytics System platforms. 


Table 7-8 Table space parameters 


Table space parameters 

5600 VI and 5600 V2 

7600 

7700 

EXTENT SIZE 

32 

16 

16 

PREFETCH SIZE 

AUTO(160) 

AUTO(128) 

384 

OVERHEAD 

3.63 

4.0 

4.0 

TRANSFERRATE 

0.07 

0.4 

0.04 
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The EXTENT SIZE and PREFETCH SIZE parameters are discussed in 
“DB2_PARALLEL_I0” on page 219. 

DB2 Optimizer uses the OVERHEAD and TRANSFERRATE parameters to 
estimate the I/O cost during query compilation. The DB2 9.7 Information Center 
contains information about these parameters: 

http : //publ i b . boul der . i bm. com/i nfocenter/db21 uw/v9r7/topi c/com. i bm.db2 . 
1 uw . admi n . perf . doc/doc/c000505 1 . html 

Any change to these parameters impacts the query I/O costing, which might 
change access plans for your queries. Do not change these parameters, unless 
there is strong evidence that a change will provide overall better performance for 
your entire query workload. 

Temporary table spaces 

For the IBM Smart Analytics System, use a DMS temporary table space because 
it provides good performance. IBM Smart Analytics System 5600 VI and V2 with 
SSD, and 7700 use temporary table space containers located on SSD devices. 
The size of the SSD device might vary depending on your platform and 
configuration options. 

All the DB2 table spaces are (by default) on a single container located on a 
dedicated file system per partition, except for the temporary table space when 
SSD is available on your specific environment. 

By default, all DB2 containers are located on the same stripe set. When 
containers are located on the same stripe set, extents are allocated in a round 
robin fashion across the container. When DB2 containers are created on various 
stripe sets, extents are allocated sequentially (on one container first, then the 
other one). 

Figure 7-1 shows an example of two unique table spaces: 

► TEMPA16K has two containers residing on the same stripe set. This strip 
setting is the most commonly used and is used on the default table space 
creation. Extents are allocated on round robin fashion. 

► TEMPB1 6K has two containers on two unique stripe sets. Extents are 
allocated sequentially. This stripe setting is used for DMS temporary table 
spaces with SSD containers on the IBM Smart Analytics System in order to 
use the SSD container first. 
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Tablespace TEMPA16K 


Tablespace TEMPB16K 


Extent 0 


Extent 1 




Extent 0 


Extent 

376830 



Extent 2 


Extent 3 




Extent 1 


Extent 

376831 









Extent 2 


Extent 

376832 



Extent 

376829 


Extent 

376830 




















Extent 

949996 


Extent 

949997 




Extent 

376829 


Extent 

949997 


Container 0: 
File system 
/db2ssd/bcuaix 

Container 1: 
File system 
/db2fs/bcuaix 


Container 0: 
File system 
/db2ssd/bcuaix 

Container 1: 
File system 
/db2fs/bcuaix 


Figure 7-1 Table spaces created on the same and unique stripe sets 


Example 7-29 shows the data definition language (DDL) to create these two 
table spaces. 

Example 7-29 DDL to create table spaces 


— TABLESPACE TEMPA16K 

CREATE TEMPORARY TABLESPACE "TEMPA16K" 

IN DATABASE PARTITION GROUP IBMTEMPGROUP 
PAGESIZE 16384 MANAGED BY DATABASE 

USING (FILE 7db2ssd/bcuaix/ssd $N%8 /BCUDB/templ6k' 94208M, 
FILE 1 / db2f s/bcuai x/NODEOOO $N /BCUDB/templ6k' 143292M) 
ON DBPARTITIONNUMS (0 to 9) 

USING (FILE ' /db2ssd/bcuai x/ssd $N%8 /BCUDB/templ6k' 94208M, 
FILE ■ /db2fs/bcuai x/NODEOO $N /BCUDB/templ6k' 143292M) 
ON DBPARTITIONNUMS (10 to 16) 

EXTENTSIZE 16 
PREFETCHSIZE 384 
BUFFERP00L BP16K 
OVERHEAD 4.0 
TRANSFERRATE 0.4 
NO FILE SYSTEM CACHING 
DROPPED TABLE RECOVERY OFF; 
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— TABLESPACE TEMPB16K 
CREATE TEMPORARY TABLESPACE TEMPB16K 
IN DATABASE PARTITION GROUP IBMTEMPGROUP 
PAGESIZE 16384 MANAGED BY DATABASE 

USING (FILE 7db2ssd/bcuaix/ssd $N%8 /BCUDB/templ6k' 94208M) ON DBPARTITIONNUMS 
(0 to 16) 

EXTENTSIZE 16 PREFETCHSIZE 384 
BUFFERPOOL BP16K OVERHEAD 4.0 
NO FILE SYSTEM CACHING TRANSFERRATE 0.4; 

COMMIT; 

ALTER TABLESPACE TEMPB16K 

BEGIN NEW STRIPE SET (FILE '/db2fs/bcuaix/N0DE000 $N /BCUDB/templ6k 1 143292M) ON 
DBPARTITIONNUMS (0 to 9) 

BEGIN NEW STRIPE SET (FILE '/db2fs/bcuaix/N0DE00 $N /BCUDB/templ6k' 143292M) ON 
DBPARTITIONNUMS (10 to 16); 


The table space with the SSD container and the RAID disk container created on 
two unique stripe sets leverages better the SSD performance benefits, because 
DB2 will allocate all extents on the SSD container first, then the container on the 
RAID array. So, TEMPI 6K table space will have a DDL identical to TEMPB1 6K in 
the previous example. 

Figure 7-2 shows the table space allocation on a 7700 platform with one 800 GB 
SSD RAID card. 
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Note that table spaces on unique stripe sets can only be specified through a 
CREATE TABLESPACE statement, followed by an ALTER TABLESPACE. There 
is no syntax to create a table space on unique stripe sets with one statement. 


db21ook: Currently (up to DB2 9.7 Fix Pack 3a), db21ook does not recognize, 
during DDL extraction, table spaces on unique stripe sets. The DDL generated 
by db21ook will locate both containers on the same stripe set. 


If you want to monitor your temporary table space spill usage, you can check the 
maximum high watermark for the temporary table space since the time the 
database was activated. Use the following methods to obtain the high water mark 
value: 

► db2pd -db <db-name> -tablespaces: Column MaxHWM. 

► MON_GET_TABLESPACE: This relational monitoring function contains a 
TBSP_MAX_PAGE_TOP column. 


Example 7-30 shows an example of a db2pd output excerpt to capture the Max 
HWM statement. The output is limited to the data of interest. During the 
workload, the maximum HWM usage is 1 016 640 pages. The data listed under 
“Containers” shows that there are 6,029,312 pages in the first SSD container. So, 
the workload used about one sixth of the SSD container for spill purposes. 

Example 7-30 Check high watermark using db2pd 

# db2pd -db bcudb -tablespaces 

Database Partition 1 — Database BCUDB — Active — Up 0 days 00:50:34 


Tablespace Configuration: 

Address Id Type Content PageSz ExtentSz Auto Prefetch ... 

0x0700000 1502683C0 260 DMS SysTmp 16384 16 No 384 


...BufID BufIDDisk FSC NumCntrs MaxStripe LastConsecPg Name 
...22 Off 2 1 15 TEMP16K 

Tablespace Statistics: 

Address Id Total Pgs UsablePgs UsedPgs PndFreePgs FreePgs ... 

0x0700000 1502683C0 260 15200000 15199968 1016640 0 14183328... 

...HWM Max HWM State MinRecTime NQuiescers PathsDropped 

. . . 1016640 1016640 0x00000000 0 0 No 


Containers: 

Address Tspld ContainNum Type 
0x0700000150269880 260 0 File 
0x0700000 150269A90 260 1 File 


Total Pgs UseablePgs ... 
6029312 6029296 

9170688 9170672 


...PathID StripeSet Container 
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0 /db2ssd/bcuaix/ssdl/BCUDB/templ6k 

1 /db2fs/bcuaix/NODE0001/BCUDB/templ6k 


Example 7-31 shows an example of MON_GET_TABLESPACE relational 
monitoring function. The column TBSP MAX PAGE TOP shows the maximum 

high watermark usage for TEMPI 6K table space. 


Example 7-31 

Check high watermark using MON_GET_TABLESPACE 


SELECT SUBSTR(TBSP NAME, 1,20) AS TBSP NAME, 


TBSP ID, MEMBER, TBSP PAGE TOP, 

TBSP MAX PAGE TOP 


FROM TABLE (MON GET TABLESPACE (' 

',-2)) 


WHERE TBSP_NAME= 1 TEMP16K' 

ORDER 

BY MEMBER 


TBSP_NAME 

TBSP_ID MEMBER 

TBSP_PAGE_T0P TBSP_MAX_ 

_PAGE_T0P 

TEMP16K 

260 

0 

64 

128 

TEMP16K 

260 

1 

601184 

601184 

TEMP16K 

260 

2 

600896 

600896 

TEMP16K 

260 

3 

634496 

634496 

TEMP16K 

260 

4 

602144 

602144 

TEMP16K 

260 

5 

628480 

628480 

TEMP16K 

260 

6 

602688 

602688 

TEMP16K 

260 

7 

672928 

672928 

TEMP16K 

260 

8 

629728 

629728 

9 record (s) 

selected. 





Note that these values represent the maximum HWM since the database was 
activated. During testing, if you want to see the maximum temporary table space 
size that a specific workload needs, perform the following steps: 

1 . Deactivate the database. 

2. Activate the database. 

3. Run the workload. 

4. Collect the monitoring data prior to deactivating the database. 


7.2 DB2 workload manager 

The DB2 workload manager provides a powerful, low overhead capability in 
controlling the DB2 activities in execution based on the business priorities. You 
can use DB2 workload manager to tame system workload peaks to prevent 
overloading and control the priority of workloads. You can integrate the DB2 
workload manager easily with AIX and Linux Workload Managers to have a 
cohesive management for the entire IBM Smart Analytics System. 
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In this section, we demonstrate how to progressively configure DB2 Workloads 
Manager to manage the workloads in an IBM Smart Analytics System. We use 
MARTS tables for our demonstration. The DDL scripts used in the examples for 
this section are provided in Appendix B, “Scripts for DB2 workload manager 
configuration” on page 299. 

For the information about the best practice in using DB2 workload manager, see: 
http : //www. i bm.com/devel operworks/data/bestpracti ces/workl oadmanagement/ 


7.2.1 Working with DB2 workload manager 

There are two perspectives to take into account when designing a DB2 workload 
manager system: 

► Business perspective: Considers business requirements about the process, 
applications, objectives, and performance expectations. 

► System perspective: Reflects the realities of efficient system management. 

The challenge of workload management is to map the business perspective to 
the system perspective as illustrated in Figure 7-3. 



A good strategy is to regulate the incoming workloads according to business 
priority and then manage the system capacity as efficiently as possible. The goal 
is to control the demands from business applications by managing the number of 
concurrent access and share of the resources among the applications. 
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The DB2 workload manager is able to identify who is submitting the workloads 
(by user or group, application, IP address, and other parameters). It can also 
determine what that user or application is to perform (for example, Data 
Manipulation Language (DML), DDL, stored procedures, or utilities). 

Using this information, the DB2 workload manager can group together users, 
roles, or applications {who) with similar business priorities into workloads. And 
the type of operation to be carried out {what) can be used to instruct the work 
action set to take a pre-defined action. Figure 7-4 illustrates mapping the 
business function into workloads. 



DB2 workload manager manages the work by using DB2 workloads and DB2 
work action sets to place work into service classes where the work executes. The 
service class determines the priority and allocation of resources that the work 
receives during execution. Service classes have a two tier hierarchy; a service 
superclass contains one or more subclasses. A superclass always has a default 
subclass, and might have one to 64 user defined subclasses. 

DB2 provides DDL statements for creating workload management objects. The 
following SQL statements are used exclusively to create, alter, drop, or manage 
workload management objects: 

CREATE HISTOGRAM TEMPLATE, ALTER HISTOGRAM TEMPLATE or DROP 
(HISTOGRAM TEMPLATE) 

CREATE SERVICE CLASS, ALTER SERVICE CLASS, or DROP (SERVICE CLASS) 

CREATE THRESHOLD, ALTER THRESHOLD, or DROP (THRESHOLD) 

CREATE WORK ACTION SET, ALTER WORK ACTION SET, or DROP (WORK ACTION SET) 
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CREATE WORK CLASS SET, ALTER WORK CLASS SET, or DROP (WORK CLASS SET) 

CREATE WORKLOAD, ALTER WORKLOAD, or DROP (WORKLOAD) 

GRANT (Workload Privileges) or REVOKE (Workload Privileges) 

Any workload management-exclusive SQL statements must be followed by a 
commit or rollback. Figure 7-5 shows DDL statements and workload 
management objects. You can use the GRANT and REVOKE statements to 
manage the privilege on a workload. 


Statements 

Workload manaqement objects 


r 

Service Class 

Create^] 

Workload 

Alter l J 

Work Class set 

Drop J 

Work Action set 
Threshold 


^Histogram Template 


Figure 7-5 Managing DB2 workload manager objects 


For more details about the DDL statements for DB2 workload manager, see DB2 
Information Center at the following address: 

http://publib.boulder.ibm.com/infocenter/db21uw/v9r7/index.jsp?topic=/c 
om.ibm.db2.1 uw. admin. wlm.doc/doc/r005 1422.html 

You can use db21ook with the -wlm option to generate WLM specific DDL 
statements. See DB2 Information Center for details: 

http://publib.boulder.ibm.com/infocenter/db21uw/v9r7/index.jsp?topic=/c 
om . i bm . db2 . 1 uw . admi n . and . doc/doc/r000205 1 . html 

DB2 provides table functions and routines for managing DB2 workload manager 
data, as follows: 

► Workload management administrative SQL routines (see Table 1 8): 
http://publib.boulder.ibm.com/infocenter/db21uw/v9r7/index.jsp7topic 
=/com. i bm. db2 . 1 uw. sql . rtn . doc/doc/ r0023485 . html 

► Monitoring and intervention: 

http://publib.boulder.ibm.com/infocenter/db21uw/v9r7/index.jsp7topic 
=/com.ibm.db2.1 uw. admin. wlm. doc/doc/c0052600. html 
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These articles have good information about DB2 workload manager: 

► DB2 9. 7: Using Workload Manager Features 
http://www.ibm.com/developerworks/data/tutorial s/dm-0908db2workl oad/ 
index.html 

► Smart Data Administration e-Kit Article on DB2 Workload Management 
Histograms (3 Parts). 

http : //www. i bm.com/devel operworks/data/ki ts/dbaki t/i ndex.html 


7.2.2 Configuring a DB2 workload manager for an IBM Smart 
Analytics System 

In order to have a consistent plan for configuring the DB2 workload manager, 
perform the configuration process progressively starting from the monitor and 
understand the activities in the database. After you have learned the 
characteristics of the workloads, you can then gradually tune the DB2 workload 
manager configuration to set and enforce limits to each group of workloads to 
obtain system stability. 

Default DB2 workload manager environment 

Starting in Version 9.5, all DB2 server installations come with Workload Manager 
activated, although neither any action taken, nor any statistics captured for the 
work being executed. Any connection made to a DB2 database is assigned to a 
DB2 workload and any user request submitted by that connection is considered 
as part of that DB2 workload. DB2 considers all work within a workload as being 
from a common source and can be treated as a common set of work. If a 
connection does not match any user defined DB2 workloads, it is assigned to the 
default workload. 

After a DB2 installation, there are three default service superclasses: 

► SYSDEFAULTUSERCLASS: For user workloads 

► SYSDEFAULTSYSTEMCLASS: For special system level tasks 

► SYSDEFAULTMAINTENANCECLASS: For maintenance works such as 
statistics gathering or table reorganization 

Initially, the DB2 workload manager is activated but not configured, so no user or 
application will be identified. Therefore, all requests will be handled by the default 
workload, which is mapped to the default user service subclass. 
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Figure 7-6 illustrates the default DB2 workload manager environment. 



For simplicity, we do not show the DefaultMaintenanceServiceClass and 
DefaultSystemServiceClass in the graphics shown in this section. 

Untuned DB2 workload manager environment 

In this section we describe how to set up a simple, untuned DB2 workload 
manager configuration that lets you monitor your workload. In a later section, we 
describe how to tune this configuration so that you can begin controlling your 
workload using the information that you obtained from monitoring it. The 
configuration of the untuned DB2 workload manager environment is as follows: 

► One new superclass (named MAIN, for example) 

► Six subclasses within the newly created user superclass (ETL, Trivial, Minor, 
Simple, Medium, and Complex) 

► A work class set for redirecting the incoming workloads to the appropriate 
service subclass based on SQL cost and workload type. 

► Remapping of the default workload from the default user service class to the 
new MAIN service superclass. 

Figure 7-7 illustrates the untuned DB2 workload manager environment, where 
the workloads are assigned to the service superclass MAIN that has six 
subclasses. The default workload object is not changed. 
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Figure 7-7 Untuned DB2 workload manager environment 


Table 7-9 contains the suggested configuration for the workloads, regarding 
timeron ranges, execution time estimated, and the maximum allowable elapsed 
execution timeron for this environment. 


Table 7-9 Initial configuration for workload management 


Threshold actions 
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In this section, we demonstrate how to set up the untuned DB2 workload 
manager environment for an IBM Smart Analytics System with these parameters. 
All the scripts are provided in Appendix B, “Scripts for DB2 workload manager 
configuration” on page 299. 

Creating service classes 

Here we create the service superclass MAIN and the service subclasses: ETL, 
Trivial, Minor, Simple, Medium and Complex. We connect to our database and 
run the DDL script 01_create_svc_cl asses . sql using the following command: 

db2 -vtf 01_create_svc_cl asses. sql 

Example 7-32 shows the new service classes created. 

Example 7-32 Service classes created 

SELECT VARCHAR(serviceclassname,30) AS SvcCl ass_name, 

VARCHAR(parentserviceclassname,30) AS Parent_Cl ass_name 
FROM syscat. serviced asses 
WHERE parentserviceclassname = 'MAIN' 

SVCCLASS_NAME PARENT_CLASS_NAME 

COMPLEX MAIN 

ETL MAIN 

MEDIUM MAIN 

SIMPLE MAIN 

SYSDEFAULTSUBCLASS MAIN 

MINOR MAIN 

TRIVIAL MAIN 

7 record(s) selected. 


For more details about the create service class statement, see DB2 
Information Center at this address: 

http://publib.boulder.ibm.com/infocenter/db21uw/v9r7/index.jsp?topic=/c 
om . i bm . db2 . 1 uw . sql . ref . doc/doc/r0050550 . html 

After the new MAIN superclass and its subclasses are created, we will remap the 
default workload from SYSDEFAULTUSERCLASS to the new MAIN service 
superclass, by altering SYSDEFAULTUSERWORKLOAD. 

Example 7-33 shows the workload name, the service subclass name, the service 
superclass name, and the workload evaluation order, before and after 
remapping. The script for this task is 02_remap_df twkl . sql . 
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Example 7-33 Remapping the default workload 
db2admin@node01:'7WLM> db2 -vtf 02_remap_dft_wkl .sql 
Original defaultUSERworkload mapping 

select varchar(workloadname,25) as Workload_name, 
varchar(serviceclassname,20) as SvClass_name, 

varchar(parentserviceclassname,20) as Parent_Class_name, EvaluationOrder 
as Eval_Order FROM syscat. workloads ORDER by 4 

W0RKL0AD_NAME SVCLASS_NAME PARENT_CLASS_NAME EVAL_ORDER 


SYSDEFAULTUSERWORKLOAD SYSDEFAULTSUBCLASS SYSDEFAULTUSERCLASS 1 

SYSDEFAULTADMWORKLOAD SYSDEFAULTSUBCLASS SYSDEFAULTUSERCLASS 2 

2 record(s) selected. 


alter workload SYSDEFAULTUSERWORKLOAD SERVICE CLASS MAIN 

DB20000I The SQL command completed successfully. 

commi t 

DB20000I The SQL command completed successfully. 

Remapped defaultUSERworkload - 

select varchar(workloadname,25) as Workload_name, 
varchar(serviceclassname,20) as SvClass_name, 

varchar(parentserviceclassname,20) as Parent_Class_name, EvaluationOrder 
as Eval_Order FROM syscat. workloads ORDER by 4 

W0RKL0AD_NAME SVCLASS_NAME PARENT_CLASS_NAME EVAL_ORDER 


SYSDEFAULTUSERWORKLOAD MAIN - 1 

SYSDEFAULTADMWORKLOAD SYSDEFAULTSUBCLASS SYSDEFAULTUSERCLASS 2 

2 record(s) selected. 


MAIN service class: Because the default workload is now mapped to the 
MAIN service class, do not disable or drop the MAIN service class, otherwise, 
all data access to the database will be interrupted. If you must disable or drop 
the MAIN service class, remap the default workload to the original 
SYSDEFAULTUSERCLASS first, using the following statement: 

ALTER workload SYSDEFAULTUSERWORKLOAD SERVICE CLASS SysDefaul tUserCl ass 


Chapter 7. Advanced configuration and tuning 251 




Creating work class sets and work action sets 

Work action sets analyze an incoming workload and send it to a pre-defined 
service subclass, based on a number of conditions: 

► Work type (READ, WRITE, CALL, DML, DDL, LOAD, and ALL) 

► Timeron cost 

► Cardinality 

► Schema names (for CALL statements only) 

Work action sets work hand-in-hand with work class sets. A work class set 
defines the conditions to be evaluated and a work action set references a work 
class set to operate. 

Example 7-34 shows the DDL statements (03_create_wk_action_set.sql) for 
creating work class sets and work action sets for the untuned environment 
configuration using the criteria set in Table 7-9 on page 249. 

Example 7-34 Work class sets and work action sets DDL 


CREATE WORK CLASS SET "W0RK_CLASS_SET_1" 

( 

WORK CLASS "WCLASS_TRIVIAL" WORK TYPE DML FOR TIMERONCOST FROM 0 to 5000P0SITI0N AT 1, 

WORK CLASS "WCLASS_MINOR" WORK TYPE DML FOR TIMERONCOST FROM 5000 to 30000P0SITI0N AT 2, 

WORK CLASS "WCLASS_SIMPLE" WORK TYPE DML FOR TIMERONCOST FROM 30000 to 300000P0SITI0N AT 3, 

WORK CLASS "WCLASS_MEDIUM" WORK TYPE DML FOR TIMERONCOST FROM 300000 to 5000000P0SITI0N AT 4, 
WORK CLASS "WCLASS_C0MPLEX" WORK TYPE DML FOR TIMERONCOST FROM 5000000 to UNB0UNDEDP0SITI0N AT 5, 
WORK CLASS "WCLASS_ETL" WORK TYPE LOAD POSITION AT 6, 

WORK CLASS "WCLASS_0THER" WORK TYPE ALL POSITION AT 7 


commit ; 

CREATE WORK ACTION SET "W0RK_ACTI0N_SET_1" FOR SERVICE CLASS "MAIN" USING WORK CUSS SET 
"W0RK_CLASS_SET_1" 

( 

WORK ACTION "WACTI0N_TRIVIAL" ON WORK CLASS "WCLASS_TRIVIAL" MAP ACTIVITY WITHOUT NESTED TO 
"TRIVIAL", 

WORK ACTION "WACTI0N_MIN0R" ON WORK CLASS "WCLASS_MIN0R" MAP ACTIVITY WITHOUT NESTED TO "MINOR", 

WORK ACTION "WACTI0N_SIMPLE" ON WORK CUSS "WCUSS_SIMPLE" MAP ACTIVITY WITHOUT NESTED TO "SIMPLE" 

’WORK ACTION "WACTI0N_MEDIUM" ON WORK CUSS "WCUSS_MEDIUM" MAP ACTIVITY WITHOUT NESTED TO "MEDIUM" 

’WORK ACTION "WACTI0N_C0MPLEX" ON WORK CLASS "WCLASS_C0MPLEX" MAP ACTIVITY WITHOUT NESTED TO 

"COMPLEX", 

WORK ACTION "WACTI0N_ETL" ON WORK CLASS "WCUSS_ETL" MAP ACTIVITY WITHOUT NESTED TO " ETL" 


For details about the CREATE WORK CLASS SET statement, see the DB2 
Information Center at: 

http://publib.boulder.ibm.com/infocenter/db21uw/v9r7/index.jsp?topic=/c 
om.ibm.db2.1 uw.sql .ref ,doc/doc/r0050577.html 
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For details about the CREATE WORK ACTION SET statement, see the DB2 
Information Center at: 

http ://publib. boulder. ibm.com/infocenter/db21uw/v9r7/index.jsp?topic=/c 
om.ibm.db2.1 uw.sql .ref .doc/doc/r0050576.html 

Workloads: So far we have created a framework only. We did not implement 
any workload controls. All we are going to do for now is to identify and monitor 
the workloads. Nothing will be prevented from executing. The controls will be 
implemented in the next stage, the tuned DB2 DB2 workload manager 
environment. 


Preparing for monitoring 

The untuned DB2 workload manager environment is ready and we can start 
collecting data. To monitor the system, use event monitors. There are three DB2 
workload manager related event monitors: 

► Statistics event monitor: For capturing histograms, counts, and high 
watermarks 

► Activity event monitor: For capturing details about activities in a workload or 
service class 

► Threshold event monitor: For capturing details about thresholds violations 

Even though the data collected by event monitors can be sent to a pipe or to a 
file, the output type chosen for IBM Smart Analytics System is the table output, 
so you can easily access the data for historical analysis. 

Create a separate, dedicated table space to store the event monitor tables. This 
table space must span all database partitions, otherwise, event monitor data will 
be lost in the partitions with no event monitor tables. In our example, we create a 
table space TS_WLM_MON as shown in Example 7-35 using the script 
04_create_wl mtabl espace . sql . 

Example 7-35 Table space creation 

db2admin@node01:'7WLM> db2 -vtf 04_create_wlm_tablespace.sql 

CREATE TABLESPACE TS_WLM_M0N MAXSIZE 2G 

DB20000I The SQL command completed successfully. 

COMMIT 

DB20000I The SQL command completed successfully. 


Use the script DB2 provided, ~/sqllib/misc/wlmevmon.ddl, to create and 
activate the event monitors. Modify the script to reflect the table space name 
created for this event monitoring. 
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Example 7-36 shows the script, 05_wlmevmon.ddl script, which is used to 
create the three event monitors, DB2ACTIVITIES, DB2STATISTICS, and 
DB2THRESHOLDVIOLATIONS, as well as all the necessary tables for these 
DB2 workload manager related monitors. Table space TS_WLM_MON is used. 

Example 7-36 05_wlmevmon.ddl script 

-- Set autocommit off 

UPDATE COMMAND OPTIONS USING C OFF; 


— Define the activity event monitor named DB2ACTIVITIES 


CREATE EVENT MONITOR DB2ACTIVITIES 
FOR ACTIVITIES 
WRITE TO TABLE 

ACTIVITY (TABLE ACTIVITY_DB2ACTIVITIES 
IN TS_WLM_M0N 
PCTDEACTIVATE 100), 

ACTIVITYSTMT (TABLE ACTIVITYSTMT_DB2ACTIVITIES 
IN TS_WLM_M0N 
PCTDEACTIVATE 100), 

ACTIVITYVALS (TABLE ACTIVITYVALS_DB2ACTIVITIES 
IN TS_WLM_M0N 
PCTDEACTIVATE 100), 

CONTROL (TABLE C0NTR0LDB2ACT IVITIES 
IN TS_WLM_M0N 
PCTDEACTIVATE 100) 

AUTOSTART; 


— Define the statistics event monitor named DB2STATISTICS 


CREATE EVENT MONITOR DB2STATISTICS 
FOR STATISTICS 
WRITE TO TABLE 

SCSTATS (TABLE SCSTATS_DB2STATISTICS 
IN TS_WLM_M0N 
PCTDEACTIVATE 100), 

WCSTATS (TABLE WCSTATS_DB2STATISTICS 
IN TS_WLM_M0N 
PCTDEACTIVATE 100), 

WLSTATS (TABLE WLSTATS_DB2STATISTICS 
IN TS_WLM_M0N 
PCTDEACTIVATE 100), 

QSTATS (TABLE QSTATS_DB2STATISTICS 
IN TS_WLM_M0N 
PCTDEACTIVATE 100), 

HISTOGRAMBIN (TABLE HIST0GRAMBIN_DB2STATISTICS 
IN TS_WLM_M0N 
PCTDEACTIVATE 100), 

CONTROL (TABLE C0NTR0L_DB2STATISTICS 
IN TS_WLM_M0N 
PCTDEACTIVATE 100) 

AUTOSTART; 


-- Define the threshold violation event monitor named DB2THRESH0LDVI0LATI0NS 

CREATE EVENT MONITOR DB2THRESH0LDVI0LATI0NS 
FOR THRESHOLD VIOLATIONS 
WRITE TO TABLE 

THRESHOLDVIOLATIONS (TABLE THRESH0LDVI0LATI0NSDB2THRESH0LDVI0LATI0NS 
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IN TS_WLM_MON 
PCTDEACTIVATE 100), 

CONTROL (TABLE C0NTR0LDB2THRESH0LDVI0LATI0NS 
IN TS_WLM_MON 
PCTDEACTIVATE 100) 

AUTOSTART; 


-- Commit work 
COMMIT WORK; 


For more details about creating event monitors, see the DB2 Information Center: 

► Creating event monitor for activities statement: 

http : //publ ib.boulder.ibm.com/infocenter/db21uw/v9r7/index. jsp?topic 
=/com. i bm. db2 . 1 uw. sql . ref . doc/doc/ r0055061 . html 

► Creating event monitor for statistics statement: 

http: //publ ib.boulder.ibm.com/infocenter/db21uw/v9r7/index. jsp?topic 
=/com. i bm . db2 . 1 uw. sql . ref . doc/doc/ r0055062 . html 

► Creating event monitor for threshold violations statement: 

http: //publ ib.boulder.ibm.com/infocenter/db21uw/v9r7/index. jsp?topic 
=/com. i bm . db2 . 1 uw. sql . ref . doc/doc/ r0055063 . html 

Starting monitoring 

To monitor the system, you need to activate the event monitors. Example 7-37 
shows how to activate the event monitors. 

Example 7-37 Starting the event monitors 
db2admin@node01:'7WLM> db2 -vtf 06_start_evt_moni tors. sql 

Monitor switches status 

SELECT substr(evmonname,l,30) as evmonname, CASE WHEN 

event_mon_state(evmonname) = 0 THEN 'Inactive 1 WHEN event_mon_state (evmonname) 

= 1 THEN 'Active' END as STATUS FROM syscat.eventmonitors 


EVMONNAME STATUS 


DB2ACTIVITIES Inactive 

DB2DETAILDEADL0CK Active 

DB2STATISTICS Inactive 

DB2THRESH0LDVI0LATI0NS Inactive 


4 record(s) selected. 


set event monitor db2acti vities state 1 

DB20000I The SQL command completed successfully. 
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set event monitor db2stati sties state 1 

DB20000I The SQL command completed successfully. 

set event monitor db2thresholdviolations state 1 

DB20000I The SQL command completed successfully. 

Monitor switches status 

SELECT substr(evmonname,l,30) as evmonname, CASE WHEN 

event_mon_state(evmonname) = 0 THEN 'Inactive' WHEN event_mon_state (evmonname) 
= 1 THEN 'Active' END as STATUS FROM syscat.eventmonitors 


EVMONNAME STATUS 


DB2ACTIVITIES Active 
DB2DETAI LDEADLOCK Active 
DB2STATISTICS Active 
DB2THRESH0LDVI0LATI0NS Active 


4 record(s) selected. 


The event monitors collect information about the workloads and store it in 
memory, not into tables. For the statistics event monitor, the statistics are flushed 
to the tables periodically. Set the number of minutes you want the in-memory 
statistics to be flushed to the tables using the database parameter 
WLM_COLLECT_INT. Typical values are 30 or 60 minutes. The default is 0, 
which means never write in-memory data to tables. 

You can also write the in-memory statistics to table manually using the procedure 
WLM_COLLECT_STATS(). The WLM_COLLECT_STATS procedure gathers 
statistics for service classes, workloads, work classes, and threshold queues and 
writes them to the statistics event monitor. The procedure also resets the 
statistics for service classes, workloads, work classes, and threshold queues. If 
there is no active statistics event monitor, the procedure only resets the statistics. 

For more information about WLM_COLLECT_STATS procedure, see: 

http : //publ i b . boul der . i bm.com/infocenter/db21 uw/v9r7/i ndex. jsp?topi c=/c 

om.ibm.db2.1 uw.sql .rtn.doc/doc/r0052005.html 

Testing the environment: Work action sets 

In this section, we demonstrate how to verify if the service classes were created 
correctly. We use a script to display the existing service superclasses and 
subclasses, execute queries of various timeron costs, and list the workload 
executed by subclass so we can verify where each of the queries was executed. 
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Before running this verification test, we must reset the Workload Manager 
statistics so we can see the results easily. Because the reset command is 
asynchronous, wait a few seconds for the counters to be zeroed before running 
the script. 

Example 7-38 shows the result of the verification script. In this example, to save 
the time, we commented out the complex query in the script. To include the 
complex query, uncomment the query in the script 07_execs_by_subcl asses . sql . 


Example 7-38 Executions by subclasses script 


db2admin@node01:'7WLM> db2 -vtf 07_execs_by_subcl asses. sql 

============== Workloads executed by Subclasses =================== 

SELECT VARCHAR( SERVICE_SUPERCLASS_NAME, 20) SUPERCLASS, VARCHAR( SERVICE_SUBCLASS_NAME 
20) SUBCLASS, C00RD_ACT_C0MPLETED_T0TAL FROM 


TABLE(WLM_GET_SERVICE_SUBCLASS_STATS_V97 ( 
like ‘ MA I N% 1 

SUPERCLASS SUBCLASS 


MAIN SYSDEFAULTSUBCLASS 

MAIN TRIVIAL 

MAIN MINOR 

MAIN SIMPLE 

MAIN MEDIUM 

MAIN COMPLEX 

MAIN ETL 

7 record(s) selected. 


",-!)) AS T WHERE SERVICE_SUPERCLASS_NAME 


_ACT_C0MPLETED_T0TAL 


0 

0 

0 

0 

0 

0 

0 


executing queries... 

===== query to be mapped to the TRIVIAL service subclass 
select count (*) from MARTS. PRODUCT 


35259 

1 record(s) selected. 


===== query to be mapped to the MINOR service subclass == 
select count(*) from MARTS. time, MARTS. time, MARTS. store 


18248384 

1 record(s) selected. 
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===== query to be mapped to the EASY service subclass ===== 

select count(*) from MARTS. PRCHS_PRFL_ANLYSIS, MARTS. TIME, MARTS. STORE 


1560605200 

1 record(s) selected. 


===== query to be mapped to the MEDIUM service subclass ===== 

select count_big(*) from MARTS. PRCHS_PRFL_ANLYS IS, MARTS. PRCHS_PRFL_ANLYSIS 


12133022500. 
1 record(s) selected. 


===== query to be mapped to the COMPLEX service subclass ===== 

============== Workloads executed by Subclasses =================== 

SELECT VARCHAR( SERVICE_SUPERCLASS_NAME, 20) SUPERCLASS, VARCHAR( SERVICE_SUBCLASS_NAME, 
20) SUBCLASS, C00RD_ACT_C0MPLETED_T0TAL FROM 

TABLE(WLM_GET_SERVICE_SUBCLASS_STATS_V97( 1 1 , " ,-l)) AS T WHERE SERVICE_SUPERCLASS_NAME 
like ' MAI N% 1 


SUPERCLASS SUBCLASS C00RD_ACT_C0MPLETED_T0TAL 


MAIN 

MAIN 

MAIN 

MAIN 

MAIN 

MAIN 

MAIN 


SYSDEFAULTSUBCLASS 

TRIVIAL 

MINOR 

SIMPLE 

MEDIUM 

COMPLEX 

ETL 


0 


7 record(s) selected. 


Timeron: A timeron is a DB2 internal measure of the cost of executing an 
SQL query. Because it takes into account the system characteristics such as 
CPU speed, hard disk speed, memory available, and many others, the timeron 
count might vary between systems, even for the same query 


We use another script (08_etl_subclass.sql) to run the load operation to verify 
the ETL service subclass. Reset the counters and wait a few seconds before 
executing the script. Example 7-39 shows the results of our test. 
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Example 7-39 Testing the ETL service subclass 
db2admin@node01:'7WLM> db2 call wlmcol lectstats 

Return Status = 0 

db2admin@node01:'7WLM> db2 -vtf 08_etl_subclass.sql 
create table db2admin. PRODUCT like marts. product 

DB20000I The SQL command completed successfully. 

declare mycursor cursor for select * from marts. product 

DB20000I The SQL command completed successfully. 

load from mycursor of cursor replace into db2admin. product 

SQL3501W The table space(s) in which the table resides will not be placed in backup pending state 
since forward recovery is disabled for the database. 

SQL1193I The utility is beginning to load data from the SQL statement " select * from 
marts. product". 

SQL3500W The utility is beginning the "LOAD" phase at time "10/25/2010 17:55:51.138355". 

SQL3519W Begin Load Consistency Point. Input record count = "0". 

SQL3520W Load Consistency Point was successful. 

SQL3110N The utility has completed processing. "35259" rows were read from the input file. 
SQL3519W Begin Load Consistency Point. Input record count = "35259". 

SQL3520W Load Consistency Point was successful. 

SQL3515W The utility has finished the "LOAD" phase at time "10/25/2010 17:55:51.385153". 


Number of rows read = 35259 

Number of rows skipped = 0 

Number of rows loaded = 35259 

Number of rows rejected = 0 

Number of rows deleted = 0 

Number of rows committed = 35259 


drop table db2admin. product 

DB20000I The SQL command completed successfully. 


================== Executed workloads status ========================== 

SELECT VARCHAR( SERVICE_SUPERCLASS_NAME, 30) SUPERCLASS, VARCHAR( SERVICE_SUBCLASS_NAME, 20) 
SUBCLASS, COORDACTCOMPLETEDTOTAL FROM TABLE(WLM_GET_SERVICE_SUBCLASS_STATS_V97( " , " ,-l)) AS T 
WHERE SERVICE_SUPERCLASS_NAME like 1 MAIN%" 


SUPERCLASS 


SUBCLASS C00RD_ACT_C0MPLETED_T0TAL 


MAIN 

MAIN 

MAIN 

MAIN 

MAIN 

MAIN 

MAIN 


SYSDEFAULTSUBCLASS 

ETL 

TRIVIAL 

MINOR 

SIMPLE 

MEDIUM 

COMPLEX 


7 record(s) selected. 
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We can see that the work action set correctly redirected the workloads to the 
corresponding subclasses. You might also have noticed that during the load 
operation, other executions were made and were shown in the 
SYSDEFAULTSUBCLASS and TRIVIAL subclasses. 

Testing the environment: Concurrency 

Now let us see how to monitor how many concurrent queries were submitted on 
each subclass. You can use your own workloads or use the scripts we provide to 
send concurrent queries to the database, and then check the monitors to see 
what happened. 

We create two files with the queries to be run on the database. Example 7-40 
shows content of easy_query.sql. 

Example 7-40 Query for EASY service subclass (query_minor.sql) 

select count(*) as Easy from MARTS. PRCHS_PRFL_ANLYSIS, MARTS. TIME, MARTS. STORE ; 


Example 7-41 is the query in minor_query.sql. 

Example 7-4 1 Query for MINOR service subclass (query_easy_query.sql) 
select count(*) as Minor from MARTS. PRCHS_PRFL_ANLYSIS, MARTS. TIME; 


For testing in UNIX environments, the db2batch utility is used to run these 
queries. Example B-15 on page 31 1 shows the db2batch script. Do not use the 
RUNSTATS command to update the database statistics prior to running the test. 
Example 7-42 shows the output of our script (09_conc_exec_Unix.sh) that starts 
the foregoing queries concurrently. 

Example 7-42 Running the queries 

db2admin@node01:'7WLM> ./09_conc_exec_Unix.sh 
db2admin@node01:~/WLM> * Timestamp: Tue Oct 26 2010 08:10:25 CDT 

* Timestamp: Tue Oct 26 2010 08:10:26 CDT 

* Timestamp: Tue Oct 26 2010 08:10:26 CDT 

* Timestamp: Tue Oct 26 2010 08:10:26 CDT 

* Timestamp: Tue Oct 26 2010 08:10:26 CDT 


* SQL Statement Number 1: 

SELECT COUNT (*) as Easy FROM empmdc, empmdc, suppliers ; 

* Timestamp: Tue Oct 26 2010 08:10:28 CDT 
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SQL Statement Number 1: 


SELECT COUNT (*) as Minor FROM empmdc, empmdc ; 


(execution continues...) 


Even though the db2batch output seems to indicate that the queries are being 
run serially, they were actually run in parallel. This situation can be seen by 
checking the AIX or Linux process status while the script is being executed 
(Example 7-43). 


Example 7-43 Parallel execution 


db2admin@node01 

:~/WLM> ps -ef Igrep db2admin 



db2admin 29100 

1 0 10:29 pts/0 

00:00:00 

db2batch - 

d sample -f 

query minor.sql 

-a db2admin/ibm2blue 

-time off 



db2admi n 29101 

1 1 10:29 pts/0 

00:00:00 

db2batch - 

d sample -f 

query _Mi nor. sql 

-a db2admin/ibm2blue 

-time off 



db2admi n 29102 

1 0 10:29 pts/0 

00:00:00 

db2batch - 

d sample -f 

query_Minor.sql 

-a db2admin/ibm2blue 

-time off 



db2admi n 29103 

1 0 10:29 pts/0 

00:00:00 

db2batch - 

d sample -f 

query_Minor.sql 

-a db2admin/ibm2blue 

-time off 



db2admi n 29104 

1 0 10:29 pts/0 

00:00:00 

db2batch - 

d sample -f 

query_easy.sql 

-a db2admin/ibm2blue ■ 

-time off 



db2admi n 29105 

1 0 10:29 pts/0 

00:00:00 

db2batch - 

d sample -f 

query_easy.sql 

-a db2admin/ibm2blue ■ 

-time off 



db2admin 29106 

1 0 10:29 pts/0 

00:00:00 

db2batch - 

d sample -f 

query_easy.sql 

-a db2admin/ibm2blue ■ 

-time off 



db2admi n 29145 

12653 0 10:30 pts/1 

00:00:00 

ps -ef 


db2admi n 29146 

12653 0 10:30 pts/1 

00:00:00 

grep db2admin 


On Windows, use db2cmd to run the test. We provide a set of scripts for the 
Windows environment in the 09a_conc_exec_Win.bat file in Appendix B, “Scripts 
for DB2 workload manager configuration” on page 299. 

Example 7-44 shows results of the script (10_conc_check.sql) that checks the 
workload executions per subclass and the high water mark for the number of 
concurrent queries. Because we have not defined new workload objects yet, 
all the connections are under the default WLM workload object 
SYSDEFAULTUSERWORKLOAD as shown in the first set of output. See the 
second set of output. 
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Example 7-44 Checking concurrency 


db2admin@node01:'7WLM> db2 -vtf 10_conc_check.sql 
Queries executed by workloads fW 


SELECT C0NCURRENT_WL0_T0P, SUBSTR (W0RKL0AD_NAME,1,25) AS WORKLOADNAME FROM 

TABLE (WLM_GET_WORKLOAD_STATS_V97 (CAST (NULL AS VARCHAR (128) ) , -2)) AS WLSTATS WHERE DBPARTITIONNUM = 0 
ORDER BY WORKLOADNAME 

C0NCURRENT_WL0_T0P W0RKL0AD_NAME 


0 SYSDEFAULTADMWORKLOAD 
8 SYSDEFAULTUSERWORKLOAD 

2 record(s) selected. 


============== Workloads executed by Subclasses ==================== 

SELECT VARCHAR( SERVICESUPERCLASSJAME, 27) SUPERCLASS, VARCHAR( SERVICE_SUBCLASS_NAME, 18) 
SUBCLASS, COORDACTCOMPLETEDTOTAL as NUMBEREXECS, CONCURRENT_ACT_TOP as CONC_HWM FROM 
TABLE(WLM_GET_SERVICE_SUBCLASS_STATS_V97(",",-1)) AS T 

SUPERCLASS SUBCLASS NUMBEREXECS CONC_HWM 


SYSDEFAULTSYSTEMCLASS SYSDEFAULTSUBCLASS 

SYSDEFAULTMAINTENANCECLASS SYSDEFAULTSUBCLASS 


SYSDEFAULTUSERCLASS 


10 record(s) selected. 


ETL 

TRIVIAL 

MINOR 

SIMPLE 

MEDIUM 

COMPLEX 


db2admi nOnodeO 1 : ~/WLM> 


Monitoring service subclasses 

Now that we have the service subclasses created in the MAIN service 
superclass, and the work action set is sending the queries to the appropriate 
service subclass, we will monitor the activities in each service subclass. As 
mentioned before, service subclasses are in charge of actually executing the 
queries sent to the database. 

Referring to Table 7-9 on page 249, you can see that we are preparing to limit the 
concurrent execution in the ETL, MEDIUM, and COMPLEX subclasses. In the 
untuned WLM configuration, we set up a limit on the number of concurrent query 
execution. Also in Table 7-9 on page 249, you can see that, with the exception of 
the SysDefaultSubclass and ETL subclasses, all the subclasses have a timeout 
threshold to prevent runaway queries. The CREATE THRESHOLD statement is 
used. Example 7-45 shows the result of the ll_create_timeout_thresholds.sql 
script. 
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Example 7-45 Script 1 1_create_timeout_Thresholds 
db2admin@node01:'7WLM> db2 -vtf ll_create_timeout_thresholds.sql 
THRESHOLD_NAME THRESHOLD_TYPE MAXVALUE 


TH_TIME_SC_TRIVIAL TOTALTIME 60 
TH_TIME_SC_MINOR TOTALTIME 300 
TH_TIME_SC_SIMPLE TOTALTIME 1800 
TH_TIME_SC_MEDIUM TOTALTIME 3600 
TH_TIME_SC_COMPLEX TOTALTIME 14400 


5 record (s) selected. 
db2admi nOnodeOl :~/WLM> 


Setting the automatic statistics collection time interval 

The default setting for the WLM_COLLECT_INT DB2 database parameter is O 
(zero), which means that the WLM statistics data collected by the monitors will 
never be sent to the event monitor output tables, nor reset. To keep a history of 
the system workload, you must set this the WLM_COLLECT_INT parameter to 
the desired time interval, in minutes. Because we want to monitor the system 
workload closely to properly adjust the service classes concurrency levels, an 
initial setting of five minutes interval was selected. After the service classes 
concurrency has been determined, this interval can be altered to 30 or 60 
minutes. See Example 7-46. 

Example 7-46 Setting the wlm_collect_int parameter 

db2admin@node01:~/WLM> db2 update db cfg using WLM_C0LLECT_INT 5 
DB20000I The UPDATE DATABASE CONFIGURATION command completed successfully. 
db2admi n@node01 :~/WLM> 


For information about the wlm_collect_int parameter, see the website: 

http : //publ ib.boulder.ibm.com/infocenter/db21uw/v9r7/index. jsp?topic=/c 

om.ibm.db2.1 uw. admin. config.doc/doc/r005 1457.html 

Operating system level monitoring with NMON 

To help determine the appropriate service classes concurrency number, you 
might have to refer to the overall system monitoring. An excellent tool for this task 
is the NMON too and its companion, the NMON_analyzer. After collecting 
operating system workload statistics, you can look for overload periods (such as 
100% CPU), and cross reference it to the DB2 WLM statistics to check if the 
concurrency levels need to be turned down in order to avoid the CPUs from 
reaching 100% utilization for long periods of time. 
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The NMON tool can be used interactively or in the data-collect mode. We use the 
second option, and save the results to a file by specifying the -f option 
(Example 7-47). The default parameters for the data-collection mode is to collect 
data at a five-minute interval (-s 300) for 24 hours (-c 288), then stop. This is the 
same time interval setting set for the WLM_COLLECT_INT parameter. The 
default name of the output file is <hostname>_YYYYMMDD_HHMM.nmon. 

Example 7-47 Collecting NMON statistics 
nmon -f 


You can send the output file to your Windows client and analyze the data with the 
NMON_Analyzer tool. Figure 7-8 shows a sample analyzing report from the 
NMON_analyzer tool. 



In Figure 7-8, we can see that the CPUs are spending almost 50% of the time 
processing system requests, instead of user requests. This indicates that the 
service sub-classes concurrency levels might need to be lowered. Running fewer 
queries at a time will leave more system resources to each of them, reducing 
system time and increasing user time. Indications of system overload are high 
process switch, high paging activity, and high run queues, shown in the PROC 
tab of the same NMON report. 

In this configuration, the thresholds are not enforced yet, allowing any amount of 
queries to be executed simultaneously. So you can search for the actual 
concurrency level at the same time interval as the system peak observed in 
NMON and in the WLM control tables. Use those numbers as references when 
setting the threshold levels during next stage of WLM configuring. 
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To check the concurrency levels of queries already executed during a specific 
time period, you can use a query similar to the one in Example 7-48. If you use 
this script (12_subcl ass_concurrency . sql ), be sure to adjust the date and time to 
the desired period before executing it. The concurrent activity top column of the 
report shows the number of queries executed during the last time interval. 

Example 7-48 Checking service class concurrency during overload period 


db2admin@node01:'7WLM> db2 -vtf 12_subclass_concurrency.sql 
select concurrent_act_top , varchar(service_subclass_name,20) as subclass, 
varchar(service_superclass_name,30) as superclass, statistics_timestamp from 
scstats_db2stati sties where statistics_timestamp between '2010-11-15-15.00.00' 
and '2010-11-15-15.30.00' 


C0NCURRENT_ACT_T0P SUBCLASS SUPERCLASS STATISTICSJTIMESTAMP 


SYSDEFAULTSUBCLASS 

SYSDEFAULTSUBCLASS 

SYSDEFAULTSUBCLASS 

SYSDEFAULTSUBCLASS 

ETL 

TRIVIAL 

MINOR 

SIMPLE 

MEDIUM 

COMPLEX 

SYSDEFAULTSUBCLASS 

SYSDEFAULTSUBCLASS 

SYSDEFAULTSUBCLASS 

SYSDEFAULTSUBCLASS 

ETL 

TRIVIAL 

MINOR 

SIMPLE 

MEDIUM 

COMPLEX 

SYSDEFAULTSUBCLASS 

SYSDEFAULTSUBCLASS 

SYSDEFAULTSUBCLASS 

SYSDEFAULTSUBCLASS 

ETL 

TRIVIAL 

MINOR 

SIMPLE 

MEDIUM 

COMPLEX 


SYSDEFAULTSYSTEMCLASS 

SYSDEFAULTMAINTENANCECLASS 

SYSDEFAULTUSERCLASS 

MAIN 

MAIN 

MAIN 

MAIN 

MAIN 

MAIN 

MAIN 

SYSDEFAULTSYSTEMCLASS 

SYSDEFAULTMAINTENANCECLASS 

SYSDEFAULTUSERCLASS 

MAIN 

MAIN 

MAIN 

MAIN 

MAIN 

MAIN 

MAIN 

SYSDEFAULTSYSTEMCLASS 

SYSDEFAULTMAINTENANCECLASS 

SYSDEFAULTUSERCLASS 

MAIN 

MAIN 

MAIN 

MAIN 

MAIN 

MAIN 

MAIN 


2010-11-15-15.01.23.304973 

2010-11-15-15.01.23.304973 

2010-11-15-15.01.23.304973 

2010-11-15-15.01.23.304973 

2010-11-15-15.01.23.304973 

2010-11-15-15.01.23.304973 

2010-11-15-15.01.23.304973 

2010-11-15-15.01.23.304973 

2010-11-15-15.01.23.304973 

2010-11-15-15.01.23.304973 

2010-11-15-15.06.23.812014 

2010-11-15-15.06.23.812014 

2010-11-15-15.06.23.812014 

2010-11-15-15.06.23.812014 

2010-11-15-15.06.23.812014 

2010-11-15-15.06.23.812014 

2010-11-15-15.06.23.812014 

2010-11-15-15.06.23.812014 

2010-11-15-15.06.23.812014 

2010-11-15-15.06.23.812014 

2010-11-15-15.11.23.866814 

2010-11-15-15.11.23.866814 

2010-11-15-15.11.23.866814 

2010-11-15-15.11.23.866814 

2010-11-15-15.11.23.866814 

2010-11-15-15.11.23.866814 

2010-11-15-15.11.23.866814 

2010-11-15-15.11.23.866814 

2010-11-15-15.11.23.866814 

2010-11-15-15.11.23.866814 


30 record (s) selected. 


db2admi nOnodeO 1 : ~/WLM> 
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The NMON tool is included with AIX from 5.3 TL09 and AIX 6.1 TL02. For Linux, 

you can download it from this link: 

http://nmon.sourceforge.net/pmwi ki .php 

For the NMON and NMON_Analyzer references, see: 

http : //www. i bm.com/devel operworks/wi ki s/di spl ay/Wi ki Ptype/nmon 

http : //www . i bm . com/devel operworks/ai x/1 i brary/au-nmon_anal yser/ 

Monitoring the default workload 

Though all users and applications are mapped to the default workload 
(SYSDEFAULTUSERWORKLOAD) only at this state, it is important to monitor 
and understand the queries sent through this default workload to understand the 
database activity. Monitoring workloads is useful when you have to gather 
information about who (or what) is sending the queries to the database. This 
monitoring can also be applied to user defined WLM workloads as well. 

Example 7-49 shows how to collect the data in the default workload by altering 
the default workload with our script 13_alter_default_workload. 

Example 7-49 Altering the default workload 

db2admin@node01:'7WLM> db2 -vtf 13_alter_default_workload.sql 

alter workload sysdefaultuserworkload collect activity data on coordinator with 

detail s 

DB20000I The SQL command completed successfully. 


The monitored data is collected each time the call wlm_collect_stats() 
statement is run. All the data collected is stored in the monitor tables created in 
Example 7-36 on page 254. There are a lot of details in those tables. As a 
starting point, you can use the SELECT statement provided in Example 7-50 to 
see what SQL statements were sent to database through default workload. The 
script used is 14_dftwkload_statements.sql . 
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Example 7-50 Selecting default workload captured data 

select varchar(session_auth_id,15) as user_name, varchar(appl_name,10) as 
appl_name, varchar(workloadname,25) as workload_name, 
varchar(service_superclass_name,10) as superclass, 

varchar(service_subclass_name,10) as subclass, date(time_started) as date, 
time(time_started) as time, varchar(stmt_text, 150) as statement_text from 
wlm_event_stmt s, wlm_event e, syscat. workloads w where s.activity_id = 
e.activity_id and s.appl_id = e.appl_id and s.uow_id = e.uow_id and 
e.workload_id = 1 and e.workload_id = w.workloadid and date(e.time_started) = 
date (current timestamp) fetch first 5 rows only 

USER_NAME APPLJAME WORKLOAD JAME SUPERCLASS SUBCLASS DATE 

TIME STATEMENT_TEXT 


SLFERRARI javaw.exe SYSDEFAULTUSERWORKLOAD MAIN MINOR 11/02/2010 

00:38:05SELECT count (*) , sum(length(packed_desc))/1024/4*2fromsysibm.systables 
SLFERRARI javaw.exe SYSDEFAULTUSERWORKLOAD MAIN MINOR 11/02/2010 

01:05:01VALUES(SUBSTR(CAST(?ASCL0B(56)) , CAST (? AS INTEGER) , CAST (? AS INTEGER))) 
SLFERRARI javaw.exe SYSDEFAULTUSERWORKLOAD MAIN MINOR 11/02/2010 

00:38:08SELECTcount(*) fromsysibm.systabl es where type='T' and creatoro 1 SYSIBM 1 
SLFERRARI javaw.exe SYSDEFAULTUSERWORKLOAD MAIN MINOR 11/02/2010 

00:38:08SELECTBPNAME, NPAGES, PAGES IZEFR0MSYSIBM.SYSBUFFERP00LS ORDER BY BPNAME 
SLFERRARI javaw.exe SYSDEFAULTUSERWORKLOAD MAIN MINOR 11/02/2010 

00:39:00 select tb.bufferpool id from syscat. tablespaces tb where tb.tbspace = 
'SYSCATSPACE' and not exists (select * from sysibm.systables st where st. 
SQL0445W Value "select tb.bufferpool id from syscat. tablespaces tb where t" 
has been truncated. SQLSTATE=01004 


5 record(s) selected with 1 warning messages printed. 


Tuned DB2 workload manager environment 

After monitoring the system for a period of time, you can start tuning DB2 
workload manager parameters based on the monitor statistics to achieve a stable 
system operation. This can mean creating more workloads, implementing 
concurrency limits by workloads and by service classes, creating more service 
superclasses (2 to 5 total), and fine-tuning the SQL cost thresholds for the work 
action sets. 

A tuned system can achieve an overall CPU utilization around 85-100%, with 
system CPU usage below 10%. Do not let the CPU work at 100% all the time. 
Turn down the concurrency levels for one or more service classes. 

When applying changes to the system, make one change at a time and monitor 
the result. Then make another adjustment and monitor the result. This process 
can take a while to complete. 
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Attention: Try to keep the number of DB2 workload manager objects and 
configurations as simple as possible. 


Here we demonstrate the process of adjusting the DB2 workload manager 
configuration. Figure 7-9 shows a high level picture of the database environment 
to be set up. Note that this configuration is for demonstration purposes only, and 
is not necessarily the preferable value. The adjustment can be based on the 
observations from the untuned configuration environment. 



We categorized users and applications by roles and create new workloads to 
manage the workloads. With DB2 workload manager, you can set limits at the 
workload level to prevent a set of users monopolize the system, or to control the 
resources usages by applying limits to user sets based on the business 
objectives. 

Administering a group as a workload allows you to apply particular monitor 
criteria to workloads including disable or enable workload monitoring separately. 

Creating roles 

Grouping users with similar workload behavior or business functions allows you 
to map their connections to a DB2 workload manager workload and apply unique 
rules to each workload. You can create DB2 roles to group users and 
applications. The rule of thumb in creating roles is to create only the roles 
needed and to keep the number roles as few as possible. 
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As an example, we create roles Adhoc, DBAs, PWRUSR, and Guest and userl 
to user9 these roles. Example 7-51 shows the DDL statements. The script used 
is 50_create_rol es . sql . 


Example 7-51 Creating DB2 roles 


CREATE 

GRANT 

GRANT 

GRANT 

commi t ; 

CREATE 

GRANT 

GRANT 

commi t ; 

CREATE 

GRANT 

GRANT 

commi t ; 

CREATE 

GRANT 

GRANT 

commi t ; 


ROLE Adhoc; 

ROLE Adhoc TO USER userl; 

ROLE Adhoc TO USER user2; 

ROLE Adhoc TO USER user3; 

ROLE DBAs; 

ROLE DBAs TO USER user4; 

ROLE DBAs TO USER user5; 

ROLE PWRUSR; 

ROLE DBAs TO USER user6; 

ROLE DBAs TO USER user7; 

ROLE GUEST; 

ROLE DBAs TO USER user8; 

ROLE DBAs TO USER user9; 


For more details about the CREATE ROLE statement, see the DB2 Information 
Center at: 

http : //publ i b. boulder. ibm.com/infocenter/db21uw/v9r7/index.jsp?topic=/c 
om . i bm . db2 . 1 uw . sql . ref . doc/doc/r0050615 . html 

Creating additional workloads 

Create new DB2 workload manager workloads and map the groups of users and 
applications to the workloads for a more granular control and monitoring. 
Creating new workloads allows you to map connections and unit of works that 
are to be monitored and controlled as peers while allowing the work being 
submitted to be treated in a common way by the system with work from all other 
workloads through evaluation and placement into the appropriate service 
subclass based on projected impact. 

Any user or application not included in one of the new workloads is considered 
as an unknown user or application and will be handled by the default workload. 
Analyze these unclassified workloads and assign them to a proper workload. 

Do not create more workloads than necessary to group users or applications 
reasonably. Creating 5 to 20 workloads seems to be a good number. 
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A workload has to be mapped to a service class, either a superclass or a 
subclass. Mapping a workload to a service superclass allows the work action set 
to decide to which service subclass the SQL query will be set, based on the 
parameters such as timeron cost or SQL type. 

Example 7-52 shows the DDL (51_create_workloads.sql) to create three 
workloads W1 , W2, and W3 for the roles created in Example 7-51 . You must 
grant the workload usage to the desired user, group, role, or public. 

Example 7-52 Creating workloads 

CREATE WORKLOAD W1 SESSIONJJSER ROLE ('DBAS') SERVICE CLASS MAIN POSITION AT 1 
GRANT USAGE on WORKLOAD W1 to public 

CREATE WORKLOAD W2 SESSIONJJSER ROLE ('ADHOC', ' PWRUSR 1 ) SERVICE CLASS MAIN POSITION AT 2 
commi t 

GRANT USAGE on WORKLOAD W2 to public 
commi t 

CREATE WORKLOAD W3 SESSIONJJSER ROLE ('GUEST') SERVICE CLASS MAIN POSITION AT 3 
commi t 

GRANT USAGE on WORKLOAD W3 to public 


For more details about the CREATE WORKLOAD statement, see the DB2 
Information Center at: 

http://publib.boulder.ibm.com/infocenter/db21uw/v9r7/index.jsp?topic=/c 
om.ibm.db2.1 uw.sql .ref .doc/doc/r0050554.html 

Defining query concurrency using thresholds 

In warehouse environments, with their typical long running queries, lowering 
query execution priority to avoid system overload by slowing them down, can 
potentially makes things even worse, because these queries will then be holding 
resources even longer than usual. Limit the number of concurrent workload 
executions to prevent the system from overloading in an IBM Smart Analytics 
System. This can be done at the workload level, at service subclass level, or at 
both, and is implemented by creating thresholds. 

The idea is to execute fewer concurrent queries so they can finish faster, instead 
of trying to execute a large number of them at once. They will be competing for 
resources, which is not efficient. Limiting the number of concurrent queries has 
been proved to be a better solution in real-life systems as shown in Figure 7-10. 
Limiting the concurrency of large queries is far more effective than limiting 
smaller queries as well as (typically) more acceptable by the end-user 
population. 
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Figure 7-10 Real-life results 


A threshold limit on a workload is about controlling the “share” of system 
resources that can be consumed by connections mapping to that workload. 
Depending on how the threshold is imposed, it can also control the share for 
particular classes of work submitted within that workload definition. Creating a 
threshold on workloads prevents a set of users or applications from using too 
many resources from the system. For example, if workload W3 has a limit of five 
concurrent executions, that will be the limit imposed, even if the system is idle 
and can handle more queries. This limit does not take into account the cost of a 
query. Any request beyond the first five workloads will be put on a queue. 

A threshold limit on a service class is about controlling the “share” of system 
resources consumed by the class of work represented by that service class. 
Creating a threshold in the service class level (superclass or subclass) limits the 
execution at this particular level, regardless of who (or what) submitted it. 

On the IBM Smart Analytics System, typically, the vast majority of queries are 
very small and quick and are executed in the Minor or Small service subclasses. 
They are usually left without a concurrency limit. Limiting the high cost subclass, 
or maybe the medium complexity subclasses, is far more effective. These few but 
demanding queries can overload the system. Another task that is often controlled 
is the load operation. 

The appropriate values for the concurrency limits depends on your system 
workloads, as observed in the untuned DB2 workload manager configuration. 
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For each of the service subclasses (except for SYSDEFAULTSUBCLASS and 
ETL subclasses), also create a timeout threshold to prevent runaway queries 
from running much longer than expected for that service subclass. 

Example 7-53 shows the script (52_enforce_thresholds.sql) to enforce the 
thresholds created earlier (untuned configuration) and its output. Use this script 
to change the concurrency levels of the thresholds to avoid system overload. 

Example 7-53 Creating thresholds 


ALTER THRESHOLD TH_TIME_SC_TRIVIAL WHEN ACTIVITYTOTALTIME > 1 MINUTE COLLECT ACTIVITY 
DATA on COORDINATOR WITH DETAILS STOP EXECUTION 

ALTER THRESHOLD TH_TIME_SC_MINOR WHEN ACTIVITYTOTALTIME > 5 MINUTES COLLECT ACTIVITY 
DATA on COORDINATOR WITH DETAILS STOP EXECUTION 

ALTER THRESHOLD TH_TIME_SC_SIMPLE WHEN ACTIVITYTOTALTIME > 30 MINUTES COLLECT 
ACTIVITY DATA on COORDINATOR WITH DETAILS STOP EXECUTION 

ALTER THRESHOLD TH_T I M E_SC_M ED I UM WHEN ACTIVITYTOTALTIME > 60 MINUTES COLLECT 
ACTIVITY DATA on COORDINATOR WITH DETAILS STOP EXECUTION 

ALTER THRESHOLD TH_TIME_SC_COMPLEX WHEN ACTIVITYTOTALTIME > 240 MINUTES COLLECT 
ACTIVITY DATA on COORDINATOR WITH DETAILS STOP EXECUTION 

select varchar(THRESH0LDNAME,25) as Threshold_name, 
varchar(THRESH0LDPREDICATE,25) as Threshold_Type, maxvalue from 
syscat. thresholds 

THRESHOLD_NAME THRESHOLD_TYPE MAXVALUE 


TH_TIME_SC_TRIVIAL TOTALTIME 60 
TH_TIME_SC_MINOR TOTALTIME 300 
TH_TIME_SC_SIMPLE TOTALTIME 1800 
TH_TIME_SC_MEDIUM TOTALTIME 3600 
TH_TIME_SC_C0MPLEX TOTALTIME 14400 


8 record (s) selected. 


Adjusting SQL cost range for a work action set 

If required, you can readjust the attributes of a work action set. Example 7-54 
(53_alter_workcl asses. sql) shows how to change the boundary between the 
work classes SIMPLE and MEDIUM, from 300,000 to 400,000. The other 
thresholds remain unchanged. 
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Example 7-54 Change SQL cost thresholds command 


db2admin@node01:'7WLM> db2 -vtf 53_al ter_workcl asses. sql 
ALTER WORK CLASS SET “W0RK_CLASS_SET_1” ALTER WORK CLASS “WCLASS_SIMPLE” FOR 
TIMERONCOST FROM 30000 to 40000 POSITION AT 3 ALTER WORK CLASS “WCLASS_MEDIUM” 
FOR TIMERONCOST FROM TIMERONCOST FROM 40000 TO 5000000 POSITION AT 4 
DB20000I The SQL command completed successfully. 

COMMIT WORK 

DB20000I The SQL command completed successfully. 


For more information about the CREATE THRESHOLD statement, see the DB2 
Information Center at: 

http://publib.boulder.ibm.com/infocenter/db21uw/v9r7/index.jsp?topic=/c 
om . i bm . db2 . 1 uw . sql . ref . doc/doc/r0050563 . html 

Preventing unknown workload to execute 

If you prefer to block any unidentified workload instead of monitoring them, you 
can do so by altering the default workload behavior. Although you cannot disable 
the default workload as with user-defined workloads, you can disallow it from 
accessing the database, using this command: 

ALTER WORKLOAD sysdefaul tuserworkl oad DISALLOW DB ACCESS 


Warning: Make sure that the user ID to be used to disallow the default 
workload belongs to a WLM workload other than the default. Otherwise, after 
you disable the default workload, you will not be able to send any more 
commands to the database unless you are a dbadm or wlmadm authority and 
issue SET WORKLOAD TO SYSDEFAULTADMWORKLOAD. 


7.2.3 DB2 workload manager resources 

The following resources provide more details about DB2 workload manager: 

► DB2 9.5 documentation: 

http://publib.boulder.ibm.com/infocenter/db21uw/v9r5/index.jsp 

► DB2 9.7 documentation: 

http ://publ ib.boulder.ibm.com/infocenter/db21uw/v9r7/index. jsp 

► DB2 workload manager best practices: 
http://www.ibm.com/developerworks/data/bestpractices/workloadmanagem 
ent/ 
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► DB2 9.7: Using Workload Manager features: 
http://www.ibni.com/developerworks/data/tutorial s/dm-0908db2workl oad/ 
index.html 

► DB2 WLM Hands on Tutorial, in DB2 9.5 documentation: 

http : //publ i b . boul der . i bm.com/infocenter/db21 uw/v9r5/topi c/com. i bm.d 
b2.1 uw. admin. wlm.doc/doc/c0053 139.html 

► developerWorks site to download the tutorial scripts 
http://www-128.ibm.com/developerworks/forums/servlet/JiveServlet/dow 
nl oad/ 11 16- 179878- 140051 15-301960/wl mi odl ab.zi p 

► Article on DB2 Workload Management Histograms (3 Parts) in the Smart 
Data Administration e-Kit: 

http : //www. i bm.com/devel operworks/data/ki ts/dbaki t/i ndex.html 

► White paper: Workload Management with MicroStrategy Software and IBM 
DB2 9.5: 

http: //www-01 . i bm. com/software/sw-1 i brary/en_US/detai 1 /G407381L49488 
H62.html 

► Exploitation of DB2’s Workload Management in an SAP Environment: 
https : //www. sdn . sap . com/i rj/scn/go/portal /prtroot/docs/1 i brary/uui d/ 
d046f3f5-13c5-2bl0-179d-80b6ae7b9657 

► IBM Redbooks publication: 

http://www.redbooks.ibm.com/redpieces/abstracts/sg247524.html 


7.3 Capacity planning 

In Chapter 6, “Performance troubleshooting” on page 127, we discuss ways to 
monitor the various resources of the system which includes CPU, I/O, memory, 
and network. In various cases, you might determine that your current system 
does not allow you to meet your Service Level Agreements anymore, due to an 
increase or change in data volume and workload. 

IBM Smart Analytics System offerings are highly scalable and offer various 
options in terms of capacity planning. In this section, we give an overview on 
identifying resource requirements, then we review the various options available 
for capacity planning. 
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7.3.1 Identifying resource requirements 

When considering capacity planning, it is important to understand the current 
resource bottlenecks on your system. This can only be achieved if you have a 
clear picture of the current performance of your system through ongoing long 
term performance monitoring. 

Here are various performance considerations regarding monitoring: 

► OS level monitoring: CPU, I/O, memory, and network utilization collected on a 
regular basis 

► DB2 level monitoring: DB2 performance metrics, such as buffer pool 
utilization, temporary table space usage, FCM buffers usage, number of 
applications connected, and nature of the query workload (rows 
read/written/returned for example) 

The data allows matching the system resources utilization (CPU, I/O, memory, 
and network) to a given DB2 workload. 

Ongoing long term performance monitoring allows you establish a baseline for 
the current performance of your system. This baseline is essential for trend and 
pattern analysis to confirm if potential bottlenecks results from legitimate 
resource requirements due to changes in the nature and concurrency of your 
workload. 

Tools are available to perform this type of monitoring, such as the Performance 
Analytics feature, or Optim Performance Manager. 

In order to ensure that resources are used optimally by DB2, it is essential to 
check that no additional tweaking or configuration changes might help from 
various perspectives: 

► From an application perspective, monitor all the applications and make sure 
that there is no resource misuse (for example, rogue query due to bad access 
plan or poorly written query). This situation is the first item to check when the 
resource usage pattern shows a sudden change. 

► From the database perspective, make sure that the database is well designed 
and maintained: 

- Best practices in database design: This approach includes absence of 
data skew, collocation of most frequent joined tables, appropriate indexes, 
and MQT for dimension tables. 

- Best practices in database maintenance: This approach includes 
up-to-date complete table and index statistics, including distribution 
statistics, and reorganization of the most frequently accessed and large 
tables when needed. 
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- Tuning: Review to see if the bottleneck can be alleviated through the 
database tunables to improve the overall efficiency of your database 
(BUFFERPOOL, SORTHEAP, and SHEAPTHRES tunables, for example). 

► From an overall workload management perspective, look at opportunities to 
manage the workload on the system using the DB2 workload manager, and 
prioritize your workload. This action can result in an optimal use of the system 
resources by preventing conflicting workloads to run in parallel. 

In terms of resource bottlenecks such as CPU, I/O, or network, the resource 
usage is tied to the type and concurrency of query and utility workload you are 
running. CPU and I/O saturation can be caused by poorly written queries or 
applications, or a poorly maintained database. Best practices in database design 
such as ensuring proper distribution keys, join collocation and the use of MQTs 
for dimension tables might help in reducing the usage of network. 

For a memory bottleneck, do a memory usage analysis to understand how the 
memory is being used: 

► Memory usage: Main memory consumers have to be accounted for. You need 
to have a clear understanding for the DB2 memory usage, and know what is 
being used in DB2 shared memory, DB2 private memory, and OS and kernel 
(kernel memory, file system caching, network buffers, and so on). This 
information is discussed in 6.3.3, “DB2 memory usage” on page 186. 

► After you have a clear picture of how the memory is being used, look for 
opportunities in tuning down part of the memory usage by reducing the 
largest memory consumers (such as buffer pool or SHEAPTHRES) without 
affecting the baseline for your current level of performance. 

► Consider IBM Smart Analytics System capacity planning options. 

In terms of capacity planning, the IBM Smart Analytics System is highly scalable: 

► There are options to increase the capacity of your existing servers in terms of 
CPU, memory, and SSD. 

► You can scale up your database by adding additional data modules. 

In the following sections, we examine the various options available to increase 
the capacity of your existing system. 

7.3.2 Increasing capacity of existing systems 

All the IBM Smart Analytics System family have options to increase the capacity 
of existing systems in terms of memory, CPU, and SSD when applicable. 

Table 7-10 lists these options for the various systems. 
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For all the options described next, the number of processors and amount of 
memory per server must be the same for the nodes contained in each of the 
following groups (but does not need to be the same across the groups): 

► All data, administration, user, and standby nodes 

► All business intelligence nodes 

► All warehouse applications module nodes 

5600 VI and 5600 V2 environments 

For 5600 VI and 5600 V2 environments, you have the possibility to increase the 
CPU, memory, and SSDs for base servers to specifications. 

Table 7-10 describes these options. 


Table 7-10 Options for 5600 models 



5600 VI j 

5600 V2 | 

5600 VI 

5600 with SSD 
VI 

5600 V2 

5600 with SSD 
V2 

CPU 

Quad-core Intel 
Xeon X5570 
(4 cores) 

2 Quad-core Intel 
Xeon X5570 
(8 cores) 

1 6-core Intel 
X5680 
(6 cores) 

2 6-core Intel 
X5680 
(12 cores) 

Memory 

32 GB 

64 GB 

64 GB 

128 GB 

SSD 

n/a 

2 x FusionlO 320 
GB 

n/a 

2 x FusionlO 
320 GB 


Additional options are available for upgrading the network: 

► Upgrading switches to 10 GbE (5600 V2 only) 

► LAN-free backup option: Addition of a FC HBA dedicated to backup 

► LAN-based backup option: Addition of a quad-port ethernet NIC dedicated to 
backup 

Not all combinations of these options are available due to hardware restrictions. 
Consult with IBM Smart Analytics system support for further details. 

7600 environment 

For the 7600 environment, the following upgrade options are available: 

► Double the CPU: Two additional dual-core 5.0 GHz POWER6® processors, 
so 8 cores in total 

► Double the memory: Additional 32 GB available, so 64 GB in total 
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7700 environment 

The 7700 environment has the following options available per server: 

► SSD: By default, one 800 GB PCIe solid-state drive (SSD) RAID card is 

installed on the 7700. There are various options available to increase your 

SSD capacity: 

- Add one additional 800 GB SSD card per server. 

- Add an EXP12x expansion drawer to add one, three, or five more PCIe 
SSD cards to increase the total capacity to respectively two, four, or six 
800 GB PCIe SSD cards per data node. 

► Network: For LAN backups, there are possibilities to add: 

- LAN-free backup: One dual-port 8 Gb Fiber Ethernet PCIe adapter for 
dedicated LAN-free backup. 

- LAN-based backup: One quad-port 1 Gb Copper Ethernet PCIe adapter, 
where one port can be used for a 1 Gb Ethernet corporate network and a 
second port can be used for dedicated 1 Gbps LAN-based backup. The 
other two ports are left available for other uses. 

Certain restrictions apply in the combination of the previous options due to 

hardware limitations. Consult the IBM Smart Analytics System 7700 User’s 

Guide, SC27-3571-01 for the restrictions. 


7.3.3 Adding additional modules 

Another option to scale your IBM Smart Analytics System is to add additional 
modules: 

► An additional user module can be added if the bottleneck resides in the user 
module. This can be the case in an environment with a high number of 
connections. 

► Additional data modules of the same release can be added to an existing IBM 
Smart Analytics System: This approach is commonly used when the data 
volume increases, and CPU and I/O resources are getting saturated to satisfy 
your workload. 

Adding additional data modules allows you to decrease the amount of active data 
per database partition, and increase parallelism by adding additional CPU and 
I/O storage for the same amount of data. 
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Integrating additional servers into an existing cluster requires thorough planning. 

Key aspects of this planning include these: 

► Integrate the new servers into your existing server infrastructure in terms of 
floor space, which includes power and cooling requirements. 

► Integrate the new servers into your existing network (cabling, IP addresses). 

► Configure the new servers as well as other servers in the cluster to ensure a 
well balanced and consistent environment (create the same users with the 
same UID and GID, same OS and kernel parameters, same firmware, and 
same software levels as the existing servers of the cluster). 

► From a DB2 perspective, add additional database partitions. This will require 
you to redistribute part or all your data across all the database partitions, 
depending on your database partition groups layout. 

Data redistribution can be done using the REDISTRIBUTE PARTITION 
GROUP command. Other options might be available to redistribute the data, 
depending on your requirements. Due to the numerous prerequisites and 
restrictions, this step requires thorough planning and comprehensive testing. 
This procedure can be time consuming, depending on the amount of data to 
be redistributed. 

From a high level, the steps to add new database partitions are as follows: 

- Add the new partitions to DB2 using, for example, db2start . . . 

ADD DBPARTITIONNUM. 

- Expand existing database partition groups to the newly added partitions 
using the command ALTER DATABASE PARTITION GROUP. 

- Alter all table spaces belonging to the previously extended database 
partition groups to add table space containers on the new partitions, using 
the command ALTER TABLESPACE... ADD clause. This step can be time 
consuming on Linux platforms, because Linux does not have fast file 
preallocation. 

- Data can be redistributed on each expanded database partition group 
using the command REDISTRIBUTE DATABASE PARTITION GROUP 
This particular step has many prerequisites and restrictions. An essential 
prerequisite is to have a full backup and recovery point before engaging in 
a data redistribution activity. Consult the DB2 Information Center for 
additional details: 

http : // publ i b . boul der . i bm.com/infocenter/db21 uw/v9r7/topi c/com. i b 
m. db2 . 1 uw. admi n . parti ti on . doc/doc/t0005017 . html 

► From a Tivoli System Automation high availability perspective, integrate the 
additional servers into a high availability group, or create a new high 
availability group. High availability is discussed in Chapter 3, “High availability” 
on page 31 . 
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A 


Smart Analytics global 
performance monitoring 
scripts 


In this appendix we list the formatting scripts discussed in 6.1 .1 , “Running 
performance troubleshooting commands” on page 129. 

Example A-1 shows the global CPU performance monitoring Perl script 
sa_cpu_mon.pl . 

Example A-1 sa_cpu_mon.pl 

#!/usr/bin/perl 


# Choose which of the following two methods applies 

# 1) on Management Node as user 'root' pick the first method 

# 2) on Admin Node as DB2 instance owner pick the second method 
my (anodes = 'lsnode -N BCUALL' ; 

#my @nodes = 'cat ~/db2nodes.cfg | tr -s 1 1 | cut -d 1 1 -f2 | sort [ uniq"; 
my $row = $nodes [0] ; 

my ($nodegroup, Jnodelist) «=: split (/: /,$row); 


my Jcontinousloop - 1 Y 1 ; 
my Jscriptparm; 
my Jnbrparms; 
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Jnbrparms = J#ARGV + 1; 
if ($nbrparms == 1) 

{ 

Jscriptparm = $ARGV[0]; 
chomp Jscriptparm; 

if (Jscriptparm eq "-s") { $continousloop = ' N 1 } 


if ((Jnbrparms > 1) [ | ((Jnbrparms == 1) && (Jscriptparm ne "-s"))) 

{ 

print "Usage is: sa_cpu_mon.pl -s\n"; 

print "where the optional parameter -s indicates 1 snapshot '\n"; 
print "versus default of continous looping. \n"; 


my Jnbrnodes = $#nodes + 1; 
my @nodeoutputfiles; 
my Jspeci fi c_node_output_fi 1 e; 
my @node_info_array; 
my ($n,$m,$p) = 0; 
my Jnodesleft; 
my Jfirstnodeoutput = 1 Y 1 ; 
my $node_info_row; 

my Jnodename; 

my ($tot_runq, $tot_blockq, $tot_swapin, $tot_swapout, $tot_usrcpu, $tot_syscpu, $tot_idlecpu) ; 
my ($tot_iowaitcpu, Jtot_loadavglmin, $tot_loadavg5min, $tot_loadavgl5min) ; 
my (Javgrunq, $avg_blockq, $avg_swapin, $avg_swapout, $avg_usrcpu, $avg_syscpu, $avg_idlecpu) ; 
my ($avg_iowaitcpu, $avg_loadavglmin, $avg_loadavg5min, $avg_loadavgl5min) ; 

my @array_blockq; 
my @array_usrcpu; 
my @array_syscpu; 
my @array_idlecpu; 

my @array_l oadavglmi n; 
my @array_loadavg5min; 
my @array_loadavgl5min; 


$n = 0; 

$nodesleft = $nbrnodes; 
$fi rstnodeoutput - ‘Y 1 ; 
while ($nodesleft) 


chomp $nodes[$n]; 
local *N0DE0UT ; 

open (N0DE0UT, "ssh $nodes[$n] 'echo 
|| die "fork error: $!"; 
$nodeoutputf i 1 es [$n] = *N0DE0UT; 

$n = $n + 1; 

$nodesleft = Jnodesleft - 1; 


'hostname": "vmstat 2 


foreach Jspeci fic_node_output_file (@nodeoutputfiles) 

{ 

whi 1 e (<Jspeci f 1 c_node_output_f 1 1 e>) 


tail -1" 'uptime' 1 & |") 
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{ if ("$firstnodeoutput" eq "Y") 

{ headerQ; Jfirstnodeoutput = "N"; } Jnode_i nfo_row = $_ 
close $specific_node_output_file | die "child cmd error: $! $?"; 


compute_and_print_system_summary() ; 

for ($p = 0; $p < $nbrnodes; $p++) 

{ 

format_and_pri nt ($p) ; 


} while (Jcontinousloop eq 1 Y 1 ) ; 


extract_info($m) ; } 




if (Jcontinousloop eq ' Y 1 ) { system ("clear"); } 

print "sa_cpLi_mon Run Block CPU - Load Average 


print " Queue Queue usr sys idle wio lmin 5mins 

15mins\n"; 


sub extract_info 

{ 

my $i - shift; 
chomp $node_info_row; 


$na variable is 'not applicable', i.e. we don't 
(Jnodename, $runq, Jblockq, $na, $na, $na, $na, 
Jsyscpu, Jidlecpu, Jiowaitcpu, $uptime) 

= splitC ' , Jnode_info_row,18) ; 

($na, $na, $na, $na, $na, $na, $na, $na, $na, $n 
= splitf ' ,Juptime,13); 
oadavglmin, Jna) = spl i t ( ' , 1 ,$loadavglmin,2) ; 
oadavg5min, Jna) = spl i t ( ' , ' ,$loadavg5min,2) ; 


need it's value (it's simply a placeholder): 
$swapin, $swapout, $na, $na, $na, $na, $usrcpu 

na, Jloadavglmin, $loadavg5min, $loadavg!5min) 


tot_blockq 

tot_usrcpu 

tot_syscpu 

tot_idlecpu 

tot_loadavg!5min 


= $tot_runq + Jrunq; 

= $tot_blockq + Jblockq; 

= $tot_usrcpu + Jusrcpu; 

= $tot_syscpu + Jsyscpu; 

= $tot_idlecpu + Jidlecpu; 

- Jtot_iowaitcpu + Jiowaitcpu; 

= Jtot_loadavglmin + Jloadavglmin; 

= Jtot_loadavg5min + Jloadavg5min; 

= Jtot_loadavgl5min + J1 oadavgl5min; 


array_nodename [Ji ] 
array_runq[Ji] 
array_blockq [Ji] 
array_usrcpu[Ji] 
array_syscpu[Ji] 
array_idlecpu[Ji] 
array_iowaitcpu[Ji] 
array_l oadavglmi n [Ji ] 
array_l oadavg5mi n [Ji ] 
array_l oadavgl5mi n [Ji] 


= Jnodename; 

= Jrunq; 

= Jblockq; 

= Jusrcpu; 

= Jsyscpu; 

= Jidlecpu; 

= Jiowaitcpu; 

= Jloadavglmin; 

= Jloadavg5min; 

= Jloadavgl5min; 


sub compute_and_pri nt_system_summary 

{ 

Jnodename = "System Avg:"; 

Javg_runq - Jtot_runq / Jnbrnodes; 
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$avg_bl ockq 
Javg_usrcpu 
$avg_syscpu 
$avg_idlecpu 
$avg_i owaitcpu 
Javg_loadavglmin 
$avg_loadavg5min 
$avg_l oadavgl5mi n 


$tot_blockq / $nbrnodes; 
Jtot_usrcpu / $nbrnodes; 
Jtot_syscpu / $nbrnodes; 
$tot_idlecpu / Jnbrnodes; 
$tot_i owaitcpu / Jnbrnodes; 
$tot_loadavglmin / Jnbrnodes; 
$tot_loadavg5min / Jnbrnodes; 
Jtot_loadavgl5min / Jnbrnodes; 


printf (“%lls %6.1f %6.1f %5.1f %5. If %5.1f %5.1f %7.Zf %7.Zf %7.2f\n", 

Jnodename, Javg_runq, Javg_blockq, Javg_usrcpu, Javg_syscpu, Javg_idlecpu, 
Javg_i owaitcpu, Javg_loadavglmin, Javg_loadavg5min, Javg_loadavgl5min) ; 


sub format_and_print 

{ 

my Jj = shift; 

printf ("%lls %6. If %6. If %5.1f %5.1f %5. If %5.1f %5s %5s %5s\n", 

Jarray_nodename[Jj] , Jarray_runq[Jj] , Jarray_blockq[Jj] , Jarray_usrcpu[Jj] , 
$array_syscpu[Jj] , 

Jarray_idlecpu[Jj] , Jarray_iowaitcpu[Jj] , Jarray_loadavglmin[Jj] , Jarray_loadavg5min[Jj] , 
JarrayJ oadavgl5mi n [J j] ) ; 



(Jtot_runq, Jtot_blockq, Jtot_swapin, Jtot_swapout, Jtot_usrcpu, Jtot_syscpu, Jtot_idlecpu) = 0;; 
(Jtot_i owaitcpu, Jtot_loadavglmin, Jtot_loadavg5min, Jtot_loadavgl5min) = 0; 

(Javg_runq, Javg_blockq, Javg_swapin, Javg_swapout, Javg_usrcpu, Javg_syscpu, Javgjidlecpu) = 0;; 
(Javg_i owaitcpu, Javg_loadavglmin, Javg_loadavg5min, Javg_loadavgl5min) = 0; 


Example A-2 shows the global I/O performance monitoring Perl script 
sa_i o_mon . pi . 

Example A-2 sa_io_mon.pl 


If Author : Patrick Thoreson 

# Company : IBM 

It Date : Oct 7th, 2010 


# Choose which of the following two methods applies 
It 1) on Management Node as user 'root' pick the first method 
It 2) on Admin Node as DB2 instance owner pick the second method 
my Onodes = "lsnode"; 

#my Onodes - 'cat ~/db2nodes.cfg | tr -s 1 1 | cut -d' 1 -f2 | sort | uniq"; 

my Jrow = $nodes[0]; 
chomp Jrow; 
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($nodegroup, Jnodelist) = split (/: /,$row); 


ly Jcontinousloop = 1 
y jscriptparm; 
y Jnbrparms; 


Jscriptparm = $ARGV[0]; 
chomp $scriptparm; 

if (Jscriptparm eq "-s") { $continousloop = 1 


if ((Jnbrparms > 1 
{ 


|| ((Jnbrparms 1) && (Jscriptparm ne "-s" 
sa_io_mon.pl -s\n"; 

1 optional parameter -s indicates 'snapshots 
ifault of conti nous loopingAn"; 


my Jnbrnodes - $#nodes + 1; 
my @nodeoutputfiles; 
my Jspeci fi c_node_output_fi 1 e; 
my ($n,$m,$p) = 0; 
my Jnodesl eft; 
my Jfirstnodeoutput = 1 Y ' ; 
my $node_info_row; 
my @node_info_array; 

my Jnodename; 
my Jblockq; 
my $blockq_info; 

my ($usrcpu,$syscpu,$idlecpu,$iowaitcpu) ; 

my $iostat_output2; 
my $cpu_info; 
my $io_info; 
my Jiodev; 

my (Jtps, Jrtps, Jwtps, JrKBps, JwKBps); 

my Jiodevinfo; 

my Jiodevremainder; 

my Jiodevnewremainder; 

my $device_count; 

my Jdevuti 1 ; 

my Jcumul ati ve_devuti 1 ; 

my $node_devutil ; 

my (Jcumul ative_rtps, Jcumul ative_wtps, $cumulative_rKBps, Jcumul ative_wKBps) ; 

## my ($node_navg_tps, $node_ntot_tps, $node_navg_rtps, $node_ntot_rtps, $node_navg_wtps, 
$node_ntot_wtps) ; 

my ($node_ntot_tps, $node_ntot_rtps, $node_ntot_wtps) ; 

## my ($node_navg_rKBps, $node_ntot_rKBps, $node_navg_rKB, $node_ntot_rKB, $node_navg_wKBps, 
$node_ntot_wKBps, $node_navg_wKB, $node_ntot_wKB) ; 

my ($node_ntot_rKBps, $node_ntot_rKB, $node_ntot_wKBps, $node_ntot_wKB) ; 

my $device_light_use; 
my $device_medium_use; 

my $device_near_max_use; 

my ($tot_blockq, $tot_usrcpu, $tot_syscpu, $tot_idlecpu, $tot_iowaitcpu, $tot_device_count, 
$tot_devutil); 
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my ($tot_device_in_use, $tot_device_light_use, $tot_device_mediumjjse, $tot_device_heavy_use, 
$tot_devi ce_near_max_use) ; 

##my ($tot_navg_tps, $tot_navg_rKB, $tot_navg_wKB, $tot_ntot_tps , $tot_ntot_rKB, $tot_ntot_wKB) ; 
my ($tot_ntot_tps, $tot_ntot_rKB, $tot_ntot_wKB) ; 

my ($savg_blockq, $savg_usrcpu, $savg_syscpu, $savg_idlecpu, $savg_iowaitcpu, $savg_device_count, 

my ($savg_device_in_use, $savg_device_light_use, $savg_device_mediLim_use, $savg_device_heavy_use, 
$savg_device_near_max_use) ; 

##my ($savg_navg_tps, $savg_navg_rKB, $savg_navg_wKB, $savg_ntot_tps, $savg_ntot_rKB, 
$savg_ntot_wKB) ; 

my ($savg_ntot_tps, $savg_ntot_rKB, $savg_ntot_wKB) ; 

my @array_nodename; 
my @array_runq; 
my @array_blockq; 
my @array_usrcpu; 
my @array_syscpu; 
my @array_idlecpa; 
my @array_i'owaitcpu; 

##my @array_navg_tps; 

##my @array_navg_rKB; 

## my @array_navg_wKB; 
my @array_ntot_tps; 
my @array_ntot_rKB; 
my @array_ntot_wKB; 
my @array_device_in_use; 
my @array_device_light_use; 
my @array_device_medium_use; 
my @array_device_heavy_use; 


$n = 0; 

$nodesleft = $nbrnodes; 

$fi rstnodeoutpat = 'Y 1 ; 
while ($nodesleft) 

{ 

chomp $nodes[$n]; 
local *N0DE0UT ; 

open (N0DE0UT, "ssh $nodes[$n] 'echo 'hostname' :AAAAA'iostat -k -x 5 2'AAAAA'grep 
procs_blocked /proc/stat' 1 & |") 

|| die "fork error: $!"; 

$nodeoutputf i 1 es [$n] = *N0DE0UT; 

$n = $n + 1; 

$nodesleft - Jnodesleft - 1; 


reset_counters() ; 


$m = 0; 

foreach $specific_node_output_file (@nodeoutputfiles) 

{ 

whi 1 e (<$speci f i c_node_ootput_f i 1 e>) 

{ if ("$firstnodeoutput" eq "Y") 

{ header(); Jfirstnodeoutput = "N"; } $node_i nfo_row = $_ ; extract_info($m) ; } 
close $specific_node_output_file | die "child cmd error: $! $?"; 


compute_and_print_system_summary() ; 
for ($p = 0; $p < $nbrnodes; $p++) 
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r-int(Jp) ; 


format_and_pr 


} while (Jcontinousloop eq 1 Y 1 ) ; 


sub extract_info 

{ 

my $i = shift; 
chomp $node_info_row; 

($device_in_use, $device_l ight_use, $device_medium_use, $device_heavy_use, $device_near_max_use ) 

- 0; 

($tps, $rtps, $wtps, JrKBps, $wKBps) = 0; 

$blockq_info = 1 1 ; 

# The $na variable is 'not applicable', i.e. we don't need it's value (it's simply a placeholder): 
(Jnodename, $iostat_output, $blockq_info) 

= split('AAAAA' ,$node_info_row,3) ; 

($na, Jblockq) 

- splitC ' ,Jblockq_info,2) ; 

($na, $na, Jiostat_output2) 

= split('avg-cpu: ' ,$iostat_output,3) ; 

($cpu_info, $io_info) 

= spl i t ( ' Devi ce : ' ,$iostat_output2,2) ; 

($na, $na, $na, $na, $na, $na, Jusrcpu, $na, $syscpu, $iowaitcpu, $na, Jidlecpu, $na) 

= splitC ' ,Jcpu_info,13) ; 

($na, $na, $na, $na, $na, $na, $na, $na, $na, $na, $na, Jiodevremainder) 

= splitC 1 , Jio_info,12) ; 


$device_count = 0; 

$node_devutil = 0; 

Jcumulative_devutil * 0; 

$node_ntot_tps = 0; 

## $node_navg_tps = 0; 

$node_ntot_rtps = 0; 

## $node_navg_rtps = 0; 

$cumulative_rtps = 0; 

$node_ntot_wtps = 0; 

## $node_navg_wtps = 0; 

$cumulative_wtps - 0; 

$node_ntot_rKBps = 0; 

## $node_navg_rKBps = 0; 

Jnode_ntot_rKB = 0; 

## Jnode_navg_rKB - 0; 

Jcumulative_rKBps = 0; 

Jnode_ntot_wKBps - 0; 

## $node_navg_wKBps - 0; 

$node_ntot_wKB = 0; 

## $node_navg_wKB - 0; 

$cumulative_wKBps = 0; 
while (Jiodevremainder) 

{ 

(Jiodev, $na, $na, $rtps, $wtps, JrKBps, JwKBps, Jna, Jna, Jna, Jna, Jdevutil, 
Jiodevnewremainder) 

= splitC ' .Jiodevremainder, 13) ; 

Jiodevremainder = Jiodevnewremainder; 

Jiodevnewremainder = "; 

# On the IBM Smart Analytics 5600 in our lab, the "sdb", "sdc", "sdd" and "sde" devices 10 stats are 
covered by using the 10 stats of "dm-O", "dm-1", "dm-2" and "dm-3". 

# since they are mapped to the very same physical LUN device; for example, if we were to count the 
stats of both "sdc" and "dm-1" we would be counting 

# the real 10 stats twice for that real physical device. 
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# Hence, tpo avoid this redundant "double-counting" of io device statistics, we skip collecting stat 
for "sdb", "sdc", "sdd" and "sde" 

# as they will be collected already under "dm-0", "dm-1", "dm-2" and "dm-3". 

if ( ("Jiodev" eq "sdb") || ("$iodev" eq "sdc") || ("$iodev" eq "sdd") || ("$iodev" eq "sde") 

) { next; } 

$device_count = $device_count + 1; 

$cumulative_devutil = Jcumul ative_devutil + $devutil; 

$cumulative_rtps = $cumulative_rtps + $rtps; 

Jcumul ative_wtps = $cumulative_wtps + $wtps; 

$cumulative_rKBps = $cumulative_rKBps + $rKBps; 

$cumulative_wKBps = $cumulative_wKBps + $wKBps; 

if ( Jdevutil > 0) { $device_in_use - $device_in_use + 1; ); 

if ((jdevutil > 0) && (Jdevutil < 30)) { jdevice_light_use = $device_l ight_use + 1; } 

if ((jdevutil >= 30) && (jdevutil < 60) ) { jdevice_medium_use = Jdevice_medium_use + 1; } 

if ((jdevutil >= 60) && (jdevutil < 90)) { jdevice_heavy_use = $device_heavy_use + 1; } 

if ( Jdevutil >= 90) ( Jdevice_near_max_use = Jdevice_near_max_use + 1; }; 


Jnode_devutil = Jcumulative_devutil / Jdevice_count; 

Jnode_ntot_rtps = Jcumul ative_rtps; 

## Jnode_navg_rtps = Jcumul ative_rtps / Jdevice_count; 

jnode_ntot_wtps = jcumul ative_wtps; 

## jnode_navg_wtps = jcumul ative_wtps / Jdevice_count; 

Jnode_ntot_tps = Jnode_ntot_rtps + Jnode_ntot_wtps; 

## Jnode_navg_tps = jnode_navg_rtps + jnode_navg_wtps; 

Jnode_ntot_rKBps - Jcumul ative_rKBps; 

## Jnode_navg_rKBps = Jcumul ative_rKBps / Jdevice_count; 

Jnode_ntot_wKBps = Jcumul ative_wKBps; 

## Jnode_navg_wKBps = Jcumul ative_wKBps / Jdevice_count; 

# Multiply the rKBps (read KB per second) by 5 seconds since that is the interval we used in the 
parallel ssh commands in this script to obtain the rKB; 

# do the same with wKBps to obtain the wKB. 

# If that time interval changes to another number, adjust in both 
multiplication lines below: 


# (for example: iostat -k -x 5 2 — > iostat -k -x 10 2, change 
'Jnode_rKBps * 10', and do the same for Jnode_wKBps) 


Jnode_ntot_rKB 
Jnode_ntot_wKB 
## Jnode_navg_rKB 
## Jnode_navg_wKB 


= Jnode_ntot_rKBps * 5 
= Jnode_ntot_wKBps * 5 
= Jnode_navg_rKBps * 5 
= Jnode_navg_wKBps * 5 


'Jnode_rKBps * 5' to 


Jtot_blockq 
Jtot_usrcpu 
Jtot_syscpu 
Jtot_idlecpu 
Jtot_iowaitcpu 
Jtot_device_count 
Jtot_devuti 1 
jtot_device_in_use 
Jtot_device_light_use 
Jtot_devi ce_medi um_use 
Jtot_device_heavy_use 
Jtot_device_near_max_use 
## Jtot_navg_tps 
## Jtot_navg_rKB 
## Jtot_navg_wKB 
Jtot_ntot_tps 
jtot_ntot_rKB 
Jtot_ntot_wKB 


= Jtot_blockq + Jblockq; 

= Jtot_usrcpu + Jusrcpu; 

= Jtot_syscpu + Jsyscpu; 

= Jtot_idlecpu + Jidlecpu; 

= Jtot_iowaitcpu + Jiowaitcpu; 

= Jtot_device_count + Jdevice_count; 

= Jtot~devi ce_i n_use + _ Jdevice_in_use; 

- Jtot_device_light_use + Jdevice_light_use; 

= Jtot_device_medium_use + Jdevice_medium_use; 
= Jtot_device_heavy_use + Jdevice_heavy_use; 

= Jtot_device_near_max_use + Jdevice_near_max_use; 
= Jtot_navg_tps + Jnode_navg_tps; 

= Jtot_navg_rKB + Jnode_navg_rKB; 

= Jtot_navg_wKB + Jnode_navg_wKB; 

= Jtot_ntot_tps + Jnode_ntot_tps; 

= jtot_ntot_rKB + jnode_ntot_rKB; 

= Jtot_ntot_wKB + Jnode_ntot_wKB; 
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Jarray_nodename [$i ] 

jarray_blockq[Ji] 

Jarray_usrcpu[Ji] 

Jarray_syscpu[Ji] 

Jarray_idlecpu[Ji] 

$ a rray_i owa i t cp u [ J i ] 
Jarray_device_count[Ji] 
$array_devutil [$i] 

$array_devi ce_i n_use [$i ] 
Jarray_devi ce_l i ght_use [$i ] 
Jarray_device_medium_use[Ji] 
Jarray_device_heavy_use[Ji] 
$array_device_near_max_use[$i] 
## Jarray_navg_tps[Ji] 

## Jarray_navg_rKB[Ji] 

## jarray_navg_wKB[ji] 
Jarray_ntot_tps[Ji] 
$array_ntot_rKB [Ji ] 
Jarray_ntot_wKB[Ji] 


= Jnodename; 

= $blockq; 

= $usrcpu; 

= $syscpu; 

= Jiowaitcpu; 

- $device_count; 

= $node_devuti 1 ; 

= $device_in_use; 

= $device_light_use; 

= Jdevice_medium_use; 
= $device_heavy_use; 

= $device_near_max_use; 

= $node_navg_tps; 

= $node_navg_rKB; 

= $node_navg_wKB; 

= $node_ntot_tps; 

= $node_ntot_rKB; 

= $node_ntot_wKB; 


sub header 

if ($continousloop eq 



1 Y 1 ) { system ("clear"); } 
CPU 






## print " Block Tot Tot Tot Tot Tot Avg/dev lActive — 

Nbr devices in %uti 1 range -- — Avg/device for node — Tot all devices on node --\n"; 

## print " Queue usr sys idle wio #devices %ut11 devices 

0-30% 30-60% 60-90% 90-100% tps readKB writeKB tps readKB 

writeKB\n"; 



CPU 

io Device Usage « \n"; 

Block Tot Tot Tot Tot Tot Avg/dev #Active — 

a -- --- Tot all devices on node — \n"; 

Queue usr sys idle wio #devices %util devices 

90-100% tps readKB writeKB\n"; 


sub compute_and_pri nt_system_summary 

t 

$nodename = "System Avg:"; 

$savg_blockq = $tot_blockq / Jnbrnodes; 

$savg_usrcpu = $tot_usrcpu / Jnbrnodes; 

$savg_syscpu = $tot_syscpu / Jnbrnodes; 

Jsavg_idlecpu = Jtot_idlecpu / Jnbrnodes; 

Jsavg_iowaitcpu = Jtot_iowaitcpu / Jnbrnodes; 

Jsavg_devuti 1 = Jtot_devutil / Jnbrnodes; 

Jsavg_device_in_use = $tot_device_in_use / Jnbrnodes; 
jsavg_device_light_use = Jtot_device_l ight_use / Jnbrnodes; 

$savg_device_medium_use = Jtot_device_medium_use / Jnbrnodes; 
Jsavg_device_heavy_use = Jtot_device_heavy_use / Jnbrnodes; 

## Jsavg_navg_tps = Jtot_navg_tps / Jnbrnodes; 

## Jsavg_navg_rKB = Jtot_navg_rKB / Jnbrnodes; 

## Jsavg_navg_wKB = Jtot_navg_wKB / jnbrnodes; 

$savg_ntot_tps = Jtot_ntot_tps / Jnbrnodes; 

$savg_ntot_rKB = Jtot_ntot_rKB / Jnbrnodes; 

jsavg_ntot_wKB = jtot_ntot_wKB / jnbrnodes; 
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## printf ("%lls %6. If %6.2f %6.2f %6.2f %6.2f %6.0f %6.2f %5. If %5.1f %5.1f 

%5.1f %5.1f %8. lf%ll.lf%ll. If %8.1f%12.1f%12.1f\n", 

printf ("%lls %6. If %6.2f %6.2f %6.2f %6.2f %6.0f %6.2f %5. If %5. If %5. If 

%5.1f %5.1f %8.1f%12.1f%12.1f\n", 

$nodename, $savg_blockq, $savg_usrcpu, $savg_syscpu, $savg_idlecpu, $savg_iowaitcpu, 
$savg_device_count, $savg_devutil , $savg_device_in_use, 

$savg_devi ce_l i ght jise, $savg_devi cejnedi um_use, $savg_devi ce_heavy_use, 
$savg_device_near_max_use, 

## $savg_navg_tps, $savg_navg_rKB, $savg_navg_wKB, $savg_ntot_tps, $savg_ntot_rKB, 

$savg_ntot_wKB) ; 

$savg_ntot_tps, $savgjitotj-KB, $savg_ntot_wKB) ; 





sub format_and_print 

{ 

my $j = shift; 

## printf ("%lls %6. If %6.2f %6.2f %6.2f %6.2f %6.0f %6.2f %5. If %5. If %5.1f 

%5.1f %5.1f %8. lf%l 1 . lf%l 1 . If %8.1f%12.1f%12.1f\n", 

printf ("%lls %6. If %6.2f %6.2f %6.2f %6.2f %6.0f %6.2f %5. If %5. If %5. If 

%5. If %5.1f %8.1f%12.1f%12.1f\n", 

$array_nodename[$j] , $array_blockq[$j] , $array_usrcpu[$j] , $array_syscpu[$j] , 

$array_i dl ecpu [$j] , $array_iowaitcpu[$j] , 

$array_devi ce_count [$ j] , $array_devutil [$j] . $array_device_in_use[$j] . 
$array_device_light_use[$j] , $array_devi cejnedi um_use[$j] , $array_device_heavy_use[$j] , 
$array_devi ce_near_max_use[$j] , 

## $array_navg_tps[$j] , $array_navg_rKB[$j] , $array_navg_wKB[$j] , $array_ntot_tps[$j] , 

$array_ntot_rKB[$j] , $array_ntot_wKB[$j] ) ; 

$array_ntot_tps[$j] , $array_ntot_rKB[$j] , $array_ntot_wKB[$j] ) ; 



($tot_blockq, $tot_usrcpu, $tot_syscpu, $tot_idlecpu, $tot_iowaitcpu, $tot_device_count, 
$tot_devutil) =0; 

($tot_device_in_use, $tot_device_l ight_use, $tot_device_medium_use, $tot_device_heavy_use, 
$tot_device_near_max_use) - 0; 

## ($tot_navg_tps, $tot_navg_rKB, $tot_navg_wKB, $tot_ntot_tps, $tot_ntot_r<B, $tot_ntot_wKB) = 0; 
($tot_ntot_tps, $tot_ntot_rKB, $tot_ntot_wKB) = 0; 

($savg_blockq, $savg_usrcpu, $savg_syscpu, $savg_idlecpu, $savg_iowaitcpu, $savg_device_count, 
$savg_devuti 1 ) = 0; 

($savg_device_in_use, $savg_device_l ight_use, Jsavgjlevi cejnedi umjise, $savg_device_heavy_use, 

## ($savg_navg_tps, $savg_navg_rKB, $savg_navg_wKB, $savg_ntot_tps, $savg_ntot_rKB, $savg_ntot_wKB) 

' ($savg_ntot_tps, $savg_ntot_rKB, $savg_ntot_wKB) = 0; 

($device_in_use, $device_l ight_use, $device_medium_use, $device_heavy_use, $device_near_max_use ) 

($tps, $rtps, Jwtps, JrKBps, $wKBps) - 0; 

$iodevremainder = "; 

$iodevnewremainder = "; 
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Example A-3 shows the global paging and memory resources performance 
monitoring Perl script sa_paging_mon.pl. 

Example A-3 sa_paging_mon.pl 


# Script Name: sa_paging_mon.pl 

# Author : Patrick Thoreson 

# Company : IBM 

# Date : Oct 11th, 2010 


# Choose which of the following two methods applies 

# 1) on Management Node as user 'root' pick the first method 

# 2) on Admin Node as DB2 instance owner pick the second method 
my (anodes = 'lsnode -N BCUALL' ; 

#my Onodes = 'lsnode'; 

#my Onodes - 'cat ~/db2nodes.cfg | tr -s 1 1 | cut -d 1 1 -f2 | sort | uniq"; 

my $row = $nodes[0]; 
chomp $row; 

my ($nodegroup, Jnodelist) = split (/: /,$row); 


my Scontinousloop = 1 Y 1 ; 
my Sscriptparm; 


$nbrparms = $#ARGV + 1; 
if ($nbrparms == 1) 

{ 

Sscriptparm = $ARGV[0]; 
chomp Sscriptparm; 

if (Sscriptparm eq "-s") { Scontinousloop = ’N 1 } 


if ((Snbrparms > 1) | | ((Snbrparms W 1) && (Sscriptparm ne "-s"))) 

{ 

print "Usage is: sa_cpu_mon.pl -s\n"; 

print "where the optional parameter -s indicates 1 snapshot 1 \n" ; 
print "versus default of continous looping. \n"; 


my Snbrnodes = $#nodes + 1; 
my @nodeoutputfiles; 
my Sspeci f 1 c_node_output_fi 1 e; 
my @node_info_array; 
my (Sn,Sm,$p) = 0; 
my Snodesleft; 
my Sfirstnodeoutput = 1 Y 1 ; 
my $node_info_row; 


my (Stot_runq, Stot_blockq, $tot_swapin, $tot_swapout, $tot_usrcpu, Stot_syscpu, Stot_idlecpu) ; 
my ($tot_iowaitcpu, Stot_node_total_mem, Stot_node_used_mem, $tot_node_free_mem) ; 
my ($tot_node_total_swap, $tot_node_used_swap, $tot_node_free_swap) ; 

my ($avg_runq, $avg_blockq, $avg_swapin, Savg_swapout, $avg_usrcpu, Savg_syscpu, $avg_idlecpu) ; 
my (Savg_iowaitcpu, $avg_node_total_mem, $avg_node_used_mem, $avg_node_free_mem) ; 
my ($avg_node_total_swap, $avg_node_used_swap, $avg_node_free_swap) ; 

my @array_nodename; 
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my @array_blockq; 
my @array_swapi n; 
my @array_swapout ; 
my @array_usrcpu; 
my @array_syscpu; 
my @array_idlecpu; 

my @array_node_total_mem; 
my @array_node_used_mem; 

my @array_node_totaT_swap; 
my @array_node_used_swap; 
my @array_node_free_swap; 


$n = 0; 

$nodesleft = $nbrnodes; 

$fi rstnodeoutput = 'Y 1 ; 
while ($nodesleft) 

{ 

chomp $nodes[$n]; 
local *N0DE0UT ; 

open (N0DE0UT, "ssh $nodes[$n] 'ec 
|| die "fork error: $!"; 
$nodeoutputf i 1 es [$n] = *N0DE0UT; 

$n = $n + 1; 

$nodesleft = Jnodesleft - 1; 


i 'hostname': 'vmstat 5 2 | t 


foreach $specific_node_output_file (@nodeoutputfiles) 

{ 

whi 1 e (<$speci f 1 c_node_output_f 1 1 e>) 

{ if ("$firstnodeoutput" eq "Y") 

{ header(); $fi rstnodeoutpat = "N"; } $node_info_row = $_ ; extract_info($m) ; } 
# { header(); Jfirstnodeoutput = "N"; } $node_i nfo_row = $_ ; print; } 

close $specific_node_output_file | die "child cmd error: $! $?"; 

$n - Sm - 1; 


compute_and_print_system_summary() ; 

for ($p = 0; $p < $nbrnodes; $p++) 

{ 

format_and_pri nt ($p) ; 

} 

} while (Jcontinousloop eq 1 Y 1 ) ; 




if ($continousloop eq 1 Y 1 ) { system 
print "sa_paging_mon Run Block 
Real Memory 


("clear"); } 

CPU Maw. — Page Swapping -- 

Swap Space \n"; 


Total Used Free Total 


Free \n"; 
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{ 


my $i = shift; 
chomp $node_info_row; 


# The $na variable is 'not applicable 1 , i.e. we don't need it's value (it's simply a placeholder): 

my ($nodename, $runq, Jblockq, $na, $na, $na, $na, $swapin, $swapout, $na, $na, $na, $na, $usrcpu, 
Ssyscpu, Jidlecpu, Jiowaitcpu, $na, $node_mem_i nfo ) 

= splitC ' ,$node_info_row, 19) ; 

# = splitC ' ,$node_info_row) ; 

my ($node_total_mem, $na, $na, $node_used_mem, $na, $na, 

$na, $na, $na, $na, $na, $na, 

$node_free_mem, $na, $na, $na, $na, $na, 

$na, $na, $na, $node_total_swap, $na, $na, 

$node_used_swap, $na, $na, $node_free_swap, $na) 

- splitC ' ,$node_mem_info,29) ; 



$array_nodename[$i] = Snodename; 

$array_runq[$i] = $runq; 

$array_blockq[$i] = $bl ockq ; 

$array_swapin[$i] = $swapin; 

$array_swapout[$i] = Sswapout; 

$array_usrcpu[$i] = $usrcpu; 

$array_syscpu[$i] = $syscpu; 

$array_idlecpu[$i] = $idlecpu; 

$array_iowaitcpu[$i] = Siowaitcpu; 

$array_node_total_mem[$i] = $node_total 

$array_node_used_mem[$i] = $node_used_iT 

$array_node_free_mem[$i] = $node_free_iT 

$array_node_total_swap[$i] = $node_tota 
$array_node_used_swap[$i] = $node_used_ 

$array_node_free_swap[$i] = $node_free_ 



sub compute_and_pri nt_system_summary 

{ 

Snodename = "System Avg:"; 

$avg_runq - $tot_runq / Snbrnodes; 

$avg_blockq = $tot_blockq / Snbrnodes; 

Savg_swapin = $tot_swapin / Snbrnodes; 

$avg_swapout = Stot_swapout / Snbrnodes; 

Savg_usrcpu = $tot_usrcpu / Snbrnodes; 

Savg_syscpu = $tot_syscpu / Snbrnodes; 

$avg_idlecpu = Stot_idlecpu / Snbrnodes; 

Savg_iowaitcpu = $tot_iowaitcpu / Snbrnodes; 

$avg_node_total_mem = $tot_node_total_mem / Snbrnodes; 
$avg_node_used_mem = $tot_node_used_mem / Snbrnodes; 

$avg_node_free_mem = $tot_node_free_mem / Snbrnodes; 

$avg_node_total_swap = $tot_node_total_swap / Snbrnodes; 
$avg_node_used_swap = Stot_node_used_swap / Snbrnodes; 
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$avg_node_free_swap = $tot_node_free_swap / Jnbrnodes; 



printf ("%lls %6.1f %6.1f %5.1f %5.1f %5.1f %5.1f %5d %5d %10d %10d 

%10d %10d %10d %10d\n", 

$nodename, $avg_runq, $avg_blockq, $avg_usrcpu, $avg_syscpu, $avg_idlecpu, 
$avg_iowaitcpu, $avg_swapin, $avg_swapout, 

$avg_node_total_swap, $avg_node_used_swap, $avg_node_free_swap ); 


sub format_and_print 

{ 

my $j = shift; 

printf ("%lls %6. If %6. If %5.1f %5. If %5.1f %5.1f %5d %5d %10d %10d 

%10d %10d %10d %10d\n", 

$array_nodename[$j] , $array_runq[$j] , $array_blockq[$j] , $array_usrcpu[$j] , 

$array_syscput$j], 

$array_idlecpu[$j] , $array_iowaitcpu[$j] , $array_swapin[$j] , $array_swapout[$j] , 
$array_node_total_mem[$j] , $array_node_used_mem[$j] ,$array_node_free_mem[$j] , 
$array_node_total_swap[$j] , $array_node_used_swap[$j] ,$array_node_free_swap[$j]) ; 



($tot_runq, $tot_blockq, $tot_swapin, $tot_swapout, $tot_usrcpu, $tot_syscpu, $tot_idlecpu) = 0;; 
($tot_iowaitcpu, $tot_node_total_mem, $tot_node_used_mem, $tot_node_free_mem) = 0; 
($tot_node_total_swap, $tot_node_used_swap, $tot_node_free_swap) = 0; 

($avg_runq, $avg_blockq, $avg_swapin, $avg_swapout, $avg_usrcpu, $avg_syscpu, $avg_idlecpu) = 0;; 
($avg_iowaitcpu, $avg_node_total_mem, $avg_node_used_mem, $avg_node_free_mem) = 0; 
($avg_node_total_swap, $avg_node_used_swap, $avg_node_free_swap) = 0; 


Example A-4 shows the disk device to file system mapping Korn shell script 
disk2fs.ksh. 

Example A-4 disk2fs.ksh 

#! /bi n/ksh 


# Script Name: disk2fs.ksh 

# Author : Patrick Thoreson 

# Company : IBM 

# Date : Sep 24th, 2010 

if [ $# != 1 ] 
then 

echo "Usage is: $0 device" 


device=${l};export device 
###echo "DEBUG Device: >${device}<" 
if [ ! -e /dev/${device} ] 
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${device} does not exist." 


device_long_desc='ls -1 /dev/${device}~;export device_long_desc 
device_type='echo ${device_long_desc} | cot -d’ 1 -f4';export device_type 

if [ "${device_type}" != "disk" ] 
then 

echo "Device is not a disk: >${device_type}<" 


###e cho "DEBUG Device type: >${device_type}<" 

device_major_nbr='echo ${device_long_desc} [ cut -d 1 1 -f5|cut -d 1 , 1 -fl';export device_major_nbr 
###e cho "DEBUG Device Major number: >${device_major_nbr}<" 

device_minor_nbr='echo ${device_long_desc} | cut -d 1 1 -f6';export deviceminornbr 
###e cho "DEBUG Device Minor number: >${device_minor_nbr}<" 


—separator : > /tmp/lvs.txt 2> 


lvsinfo =l ' ;export lvsinfo 

lvsinfo="grep ' : '${device_major_nbr} 1 : '${device_minor_nbr} ' : ' /tmp/lvs.txt | cut -c3-' 

if [ "${lvsinfo}x" = "x" ] 

then 

lvsinfo='grep 7dev/'${device} ' (' /tmp/lvs.txt | cut -c3-~ 

if [ "${lvsinfo}x" = "x" ] 

then 


exit 4 
### else 

### echo "DEBUG lvsinfo: >${lvsinfo}<" 

fi 

###e lse 

### echo "DEBUG lvsinfo: >${lvsinfo}<" 


lvname='echo $ { 1 vsi nfo} | cut -d: -fl~;export lvname 
###e cho "DEBUG lvname: >${lvname}<" 
vgname='echo ${1 vsi nfo} | cut -d: -f2~;export vgname 
###e cho "DEBUG vgname: >${vgname}<" 

1 vdevice= '/dev/ '{{vgname} 7' ${ lvname} ;export lvdevice 
###e cho "DEBUG lvdevice: >${lvdevice}<" 

fstabinfo=' 1 ;export fstabinfo 
fstabinfo=~grep {{lvdevice} /etc/fstab' 

###e cho "DEBUG fstab info: >${fstabinfo}<" 

fsmountdir='echo {{fstabinfo} | cut -d 1 1 -f2~;export fsmountdir 

echo 'hostname'": I/O device {{device} — > filesystem mountdir: {{fsmountdir} (LV: {{lvdevice})" 
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Example A-5 shows the file system to disk device mapping Korn shell script 
fs2disk.ksh. 


Example A-5 fs2disk.ksh 


# ! /bi n/ksh 

# Script Name: fs2disk.ksh 

# Author : Patrick Thoreson 

# Company : IBM 

# Date : Sep 23th, 2010 


if [ $# != 1 ] 
then 

echo "Usage is: $0 <filesystem mount directory>" 
echo "Ex: $0 /stage2" 


fsmountdi r=$ { 1 } ; export f smountdi r 
#echo "DEBUG fsmountdir: >${fsmountdi r}<" 

if [ ! -e ${fsmountdir} ] 
then 

echo "filesystem mount directory ${fsmountdi r} does not exist." 


fstabinfo=' 1 ;export fstabinfo 
tmpfsdir=' 1 ;export tmpfsdir 

cat /etc/fstab | while read fstabinfo 

tmpfsdir=~echo $ { f stabi nfo} | tr -s 1 ' | cut -d' 1 -f2~ 

# if [ "${tmpfsdir}" = "${fsmountdi r} " ] 

if [ "{{tmpfsdir}" = "{{fsmountdir}" -o "${tmpfsdir} " = "${fsmountdir)/" ] 
then 

lvdevice=~echo {{fstabinfo} | tr -s ' ' | cut -d 1 ' -fl~ 


if [ "{{tmpfsdir}" != "{{fsmountdir}" -a "{{tmpfsdir}" 1= "{{fsmountdir}/" ] 
then 

echo "File system {{fsmountdir} not found in /etc/fstab." 


#echo "DEBUG lvdevice: >{ { 1 vdevi ce}<" 

#ech 0 e "DEBUG liname: >{|!vname}<" / ’* P ° 

vgname='echo {{lvdevice} | cut -d 1 / 1 -f3';export vgname 
#echo "DEBUG vgname: >${vgname}<" 

lvs --noheadings -o lv_name,vg_name,lv_kernel_major,lv_kernel jninor, devices --separator : 
{{lvdevice} > /tmp/lvs.txt 2> /dev/null 

lvsinfo=' 1 ;export lvsinfo 

devi ce_major_nbr= 1 '-.export device_major_nbr 

devi ce_mi nor_nbr= ' ' ;export devi cemi nor_nbr 

majorminor=' 1 ; export majorminor 
otherdevi cei nfo= " ; export otherdevi cei nfo 
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otherdevicelist^ 1 ' ;export otherdevicelist 
cat /tmp/lvs.txt | cut -c3- | while read lvsinfo 

device_major_nbr='echo ${1 vsi nfo} | cut -d: -f3' 

# echo "DEBUG Device Major number: >${devicejnajor_nbr}<" 
device_minor_nbr='echo ${1 vsinfo} | cut -d: -f4' 

# echo "DEBUG Device Minor number: >${device_minor_nbr}<" 
majorminor={{device_major_nbr} 1 , '{{device_minor_nbr} 

# echo "DEBUG majorminor : >${majorminor}<" 

device='ls -Id /dev/* | tr -s ' 1 | grep 1 disk ' | grep " ${majorminor} » | cut -d 1 / 1 -f3~ 

# echo "DEBUG Device : >${device}<" 
otherdeviceinfo='echo {{lvsinfo} | cut -d: -f5' 

# echo "DEBUG Device info: >${otherdeviceinfo}<" 

otherdevicelist='echo ${otherdeviceinfo} | sed 1 l,$s/\/dev\///g' | sed 1 l,$s/([0-9] )//g " ' 

# echo "DEBUG Device list: >${otherdevicelist}<" 

echo 'hostname'": filesystem mountdir ${fsmountdir} (LV: ${ 1 vdevi ce} ) ===> I/O device {{device}, 
other device(s) {{otherdevicelist}." 
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B 


Scripts for DB2 workload 
manager configuration 


In this appendix we provide the scripts used in 7.2, “DB2 workload manager” on 
page 243, which show how to configure a DB2 workload manager for an IBM 
Smart Analytics System. 


© Copyright IBM Corp. 201 1 . All rights reserved. 




B.1 Creating MARTS tables 

This section describes how to create the tables used to test the DB2 workload 
manager work action set. 

For our workload management scripts, we modify the DB2 provided scripts under 
the <DB2 home directory>/samp\es/6ata to create and populate four tables under 
the MARTS schema: 

► Fact table: PRCHS_PRFL_ANLYSIS 

► Dimension tables: STORE, TIME, and PRODUCT 

Example B-1 shows the modified script of createMartTables.sql to create the 
tables. We also change the table space definition from USERSPACE1 to 
TS_SMALL, the table space for non-partitioned tables. Do not run RUNSTATS on 
these tables after populating data, otherwise, the timeron cost will be much lower 
and the work action set will not redirect the queries to the intended service 
subclasses during the exercises. 

Example B-1 Script MARTS_create_tables.sql 


-- MARTS_create_tables.sql 

-- This script creates the sample tables used to optionally test the 
Work Action Set timeron ranges 


DROP TABLE MARTS. TIME; 

DROP TABLE MARTS. STORE; 

DROP TABLE MARTS . PRCHS_PRFL_ANLYS I S ; 

DROP TABLE MARTS. PRODUCT; 

DROP SCHEMA MARTS RESTRICT; 

CREATE SCHEMA MARTS; 

CREATE TABLE MARTS. TIME ( 

TIME_ID SMALLINT NOT NULL, 

UNQ_ID_SRC_STM CHAR(20), 

TIME_TP_ID SMALLINT NOT NULL, 

CDR_YR SMALLINT, 

CDR_QTR SMALLINT, 

CDR_M0 SMALLINT, 

DAY_OF_CDR_YR SMALLINT, 

DAY_CDR_QTR SMALLINT, 

DAY_CDR_M0 SMALLINT, 

FSC_YR SMALLINT, 

FSC_QTR SMALLINT, 

FSC_M0 SMALLINT, 

NBR_DYS SMALLINT, 

NBR_BSN_DYS SMALLINT, 
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PBLC_HOL_F SMALLINT, 

BSN_DAY_F SMALLINT, 

LAST_BSN_DAY_MO_F SMALLINT, 

SSON_ID SMALLINT, 

MONTH_LABEL VARCHAR(20) , 

QTR_LABEL VARCHAR(IO) 

) 

IN TS_SMALL; 

CREATE TABLE MARTS. STORE ( 

STRIPID INTEGER NOT NULL, 

STR_TP_NM VARCHAR(64) NOT NULL, 
ORG_IP_ID INTEGER NOT NULL, 
PRN_OU_IP_ID INTEGER, 

MGR_EMPE_ID INTEGER, 

NR_CPTR_PRX_NM VARCHAR(64), 
SALE_VOL_RNG_NM VARCHAR(64), 

FLRS P_AREA_RNG_NM VARCHAR(64), 
STR_CODE CHAR(6) NOT NULL, 
STR_SUB_DIV_NM VARCHAR(64) NOT NULL, 
STR_REG_NM VARCHAR(64) NOT NULL, 
STR_DIS_NM VARCHAR(64) NOT NULL, 
STR_NM VARCHAR(64) NOT NULL 

) 

IN TS_SMALL; 

CREATE TABLE MARTS. PRCHS_PRFL_ANLYSIS ( 

STR_IP_ID INTEGER NOT NULL, 

PD_ID INTEGER NOT NULL, 

TIMEID SMALLINT NOT NULL, 
NMBR_OF_MRKT_BSKTS INTEGER, 
NUMBER_OF_ITEMS INTEGER, 
PRDCT_BK_PRC_AMUNT DECIMAL(14,2) , 
CST_OF_GDS_SLD_CGS DECIMAL(14,2) , 
SALES_AMOUNT DEC IMAL ( 14 , 2} 

) 

IN TS_SMALL; 

CREATE TABLE MARTS. PRODUCT ( 

PD_ID INTEGER NOT NULL, 
UNQ_ID_SRC_STM CHAR(20), 

PD_TP_NM VARCHAR(64) NOT NULL, 
BASE_PD_ID INTEGER, 

NM VARCHAR(64) , 

PDJDENT CHAR (25) , 

DSC VARCHAR (256) , 

PD_DEPT_NM VARCHAR(64) NOT NULL, 
PD_SUB_DEPT_NM VARCHAR(64) NOT NULL, 
PD_C L_NM VARCHAR(64) NOT NULL, 
PD_SUB_CL_NM VARCHAR(64) NOT NULL 

) 

IN TS_SMALL; 
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To load data into the MARTS tables, use the MARTS_load_tables.sql script 
shown in Example B-2. 

Example B-2 Script MARTS_load_tables.sql 


— MARTS_1 oad_tabl es . sql 

-- This script loads data into the 4 MARTS tables used to help 
in configuring the WLM. 


LOAD from MartPrchProfAnalysis.txt of del REPLACE into MARTS. PRCHS_PRFL_ANLYSIS ; 

LOAD from MartPD.txt of del REPLACE into MARTS. PRODUCT ; 

LOAD from MartStore.txt of del REPLACE into MARTS. STORE ; 

LOAD from MartTime.txt of del REPLACE into MARTS. TIME ; 

select 1 PRCHS_PRFL_ANLYSIS 1 , count(*) from MARTS. PRCHS_PRFL_ANLYSIS UNION 
select 'PRODUCT 1 , count(*) from MARTS. PRODUCT UNION 
select 'STORE 1 , count(*) from MARTS. STORE UNION 
select 'TIME', count(*) from MARTS. TIME; 


Use MARTS_count_tables.sql shown in Example B-3 to count the rows of the 
MARTS tables. 

Example B-3 Script MARTS_count_tables.sql 


— MARTS_count_tabl es . sql 

- This script count the rows in all MARTS tables 


select ' PRCHS_PRFL_ANLYSIS ' , count(*) from MARTS. PRCHS_PRFL_ANLYSIS UNION 
select 'PRODUCT 1 , count(*) from MARTS. PRODUCT UNION 
select 'STORE', count(*) from MARTS. STORE UNION 
select 'TIME', count(*) from MARTS. TIME; 


To drop the MART tables, use the script shown in Example B-4. 
Example B-4 Script MARTS_drop_tables.sql 


— MART_drop_tabl es . sql 

-- This script creates the sample tables used to optionally test the 
Work Action Set timeron ranges 


DROP TABLE MARTS. TIME; 

DROP TABLE MARTS. STORE; 

DROP TABLE MARTS. PRCHS_PRFL_ANLYSIS; 
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DROP TABLE MARTS. PRODUCT; 


DROP SCHEMA MARTS RESTRICT; 


B.2 Untuned DB2 workload manager configuration 

These scripts are use in the untuned DB2 workload manager environment 
exercise. 

Example B-5 shows the script to create services classes. 

Example B-5 01_create_svc_classes.sql 

— Script 01_create_svc_cl asses. sql 

-- This script creates: 

service superclass MAIN 

service subclasses ETL, Trivial, Minor, Simple, Medium and Complex 


-- To delete a service superclass you need to drop every dependent object: 
remap the SYSDEFAULTUSERWORKLOAD back to SYSDEFAULTUSERCLASS 
-- (if applicable) 

disable the service subclasses 

drop work action sets 

drop work class sets 

drop service classes' thresholds 

drop service subclasses 

drop service superclass 

CREATE SERVICE CLASS MAIN ; 


CREATE SERVICE CLASS ETL under MAIN COLLECT AGGREGATE ACTIVITY DATA EXTENDED; 
commit; 

CREATE SERVICE CLASS Trivial under MAIN COLLECT AGGREGATE ACTIVITY DATA EXTENDED; 


CREATE SERVICE CLASS 
commit; 

CREATE SERVICE CLASS 
commit; 

CREATE SERVICE CLASS 
commit; 

CREATE SERVICE CLASS 


Minor under MAIN COLLECT AGGREGATE ACTIVITY DATA EXTENDED; 
Simple under MAIN COLLECT AGGREGATE ACTIVITY DATA EXTENDED; 

Medium under MAIN COLLECT AGGREGATE ACTIVITY DATA EXTENDED; 

Complex under MAIN COLLECT AGGREGATE ACTIVITY DATA EXTENDED; 
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commit; 


-- Verify existing super and sub service classes 
select 

varchar(serviceclassname,30) as SvcClass_name, 

varchar(parentserviceclassname,30) as Parent_Class_name 

syscat . servi cecl asses 
where parentserviceclassname = 'MAIN' ; 


Example B-6 shows the script to remap the DEFAULTUSERWORKLOAD out 
from SYSDEFAULTUSERCLASS and into the MAIN superclass. 

Example B-6 02_remap_dft_wkl.sql. 


— Script 02_remap_dft_wkl .sql 


— This 

script will remap the DEFAULTUSERWORKLOAD out from SYSDEFAULTUSERCLASS 

and into the newly created MAIN supercla 

ss 

echo 



echo -- 

— Original defaultUSERworkload mapping ; 

select 

varchar(workloadname,25) a' 

Workload name, 


varchar(serviceclassname,20) a= 

SvClass name. 


varchar(parentservi cecl assname,20) a= 

Parent Class name, 


EvaluationOrder a' 

Eval Order 

FROM 

syscat. workloads 


ORDER 

by 4; 


alter workload SYSDEFAULTUSERWORKLOAD 



SERVICE CLASS MAIN ; 


commit; 






ecio 

emappe e au vio\ oa 


select 

varchar(workloadname,25) as 

Workload name, 


varchar(serviceclassname,20) as 

SvClass name. 


varchar (parentserviceclassname, 20) a' 

Parent Class name, 


EvaluationOrder a 

Eval_Order 

FROM 

syscat. workloads 


ORDER 

by 4; 
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Example B-7 shows the script to create new work class sets and work action 
sets. 


Example B-7 03_create_wk_action_set.sq 


-- Script 03_create_wk_action_set.sql 

— This script creates the WORK_CLASS_SET and the W0RK_ACTI0N_SET 

-- with the starting values for the services subclasses 

-- as described earlier, in "Untuned DB2 workload manager environment" 


CREATE WORK CLASS SET "W0RK_CLASS_SET_1" 

( 

WORK CLASS "WCLASS_TRIVIAL" WORK TYPE DML FOR TIMERONCOST FROM 0 to 5000P0SITI0N AT 

1 , 

WORK CLASS "WCLASS_MINOR" WORK TYPE DML FOR TIMERONCOST FROM 5000 to 3O0OOPOSITION 
AT 2, 

WORK CLASS "WCLASS_SIMPLE" WORK TYPE DML FOR TIMERONCOST FROM 30000 to 300000P0S I TION 
AT 3, 

WORK CLASS "WCLASS_MEDIUM" WORK TYPE DML FOR TIMERONCOST FROM 300000 to 5000000 
POSITION AT 4, 

WORK CLASS "WCLASS_COMPLEX " WORK TYPE DML FOR TIMERONCOST FROM 5000000 to UNBOUNDED 
POSITION AT 5, 

WORK CLASS "WCLASS_ETL" WORK TYPE LOAD POSITION AT 6, 

WORK CLASS "WCLASS_OTHER" WORK TYPE ALL POSITION AT 7 

) ; 

commit ; 

echo SYSCAT . WORKCLASSSETS table contents ; 

SELECT varchar(workclasssetname,40) as Work_Cl ass_Set_name from SYSCAT. WORKCLASSSETS ; 

echo ================= ========================= ; 

echo SYSCAT. WORKCLASSES table contents ; 

SELECT varchar(workclassname,20) as Work_Cl ass_name, varchar(workcl asssetname,20) as 
Work_Class_Set_name, int(fromvalue) as From_value, int(tovalue) as To_value, 
evaluationorder as Eval_order from SYSCAT. WORKCLASSES order by evaluationorder ; 


CREATE WORK ACTION SET "W0RK_ACTI0N_SET_1" FOR SERVICE CLASS "MAIN" USING WORK CLASS SET 
11 WORK_C LASS_S ET_1 11 
( 

WORK ACTION "WACTION_TRIVIAL" ON WORK CLASS "WCLASS_TRIVIAL" MAP ACTIVITY WITHOUT 
NESTED TO "TRIVIAL", 

WORK ACTION "WACTI0N_MIN0R" ON WORK CLASS "WCLASS_MINOR" MAP ACTIVITY WITHOUT 
NESTED TO "MINOR", 

WORK ACTION "WACTION_SIMPLE" ON WORK CLASS "WCLASS_SIMPLE" MAP ACTIVITY WITHOUT 
NESTED TO "SIMPLE" , 

WORK ACTION "WACTION_MEDIUM" ON WORK CLASS "WCLASS_MEDIUM" MAP ACTIVITY WITHOUT 
NESTED TO "MEDIUM" , 
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WORK ACTION "WACTION_COMPLEX" ON WORK CLASS "WCLASS_COMPLEX" MAP ACTIVITY WITHOUT 
NESTED TO "COMPLEX", 

WORK ACTION "WACTION_ETL" ON WORK CLASS "WCLASS_ETL" MAP ACTIVITY WITHOUT 
NESTED TO "ETL" 

) ; 

commit; 


echo SYSCAT.WORKACTIONSETS table contents ; 

SELECT varchar(actionsetname,30) as Work_Action_Set_name, varchar(objectname,30) as 
Object_name from SYSCAT.WORKACTIONSETS ; 

echo =================================================== ; 

echo SYSCAT . WORKACTIONS table contents ; 

SELECT varchar(actionname,25) as Work_Action_name, varchar(actionsetname,25) as 
Work_Action_Set_name, varchar(workcl assname,25) as Work_Cl ass_name from 
SYSCAT. WORKACTIONS ; 


Example B-8 shows the script to create DB2 workload manager table space. 
Example B-8 04_create_wlm_tablespace.sql 


-- Script 04_create_wlm_tablespace.sql 

-- This script creates the table pace for the WLM tables over 
-- all DB2 database partitions. 

WLM data gathered for DB database partitions whose tablespace/WLM control tables 
are nonexistent will be discarted! 

CREATE TABLESPACE TS_WLM_MON MAXSIZE 2G; 


Example B-9 shows the script 05_wlmevmon.ddl to create event monitors. 
Example B-9 05_wlmevmon.ddl 


-- Script 05_wlmevmon.ddl 
- sql 

-- Sample DDL to create three workload management 
-- event monitors. 

-> assumes db2start issued 

-> assumes connection to a database exists 

-> assumes called by "db2 -tf wlmevmon.ddl " 

-> Other notes: 

- All target tables will be created in the table space named 
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TS_WLM_MON. Change this if necessary. 

Any specified table spaces must exist prior to executing this DDL. 

Furthermore they should reside across all partitions; otherwise 
monitoring information may be lost. Also, make sure they have space 
to contain data from the event monitors. 

If the target table spaces are DMS table spaces, the PCTDEACTIVATE parameter 
specifies how full the table space must be before the event monitor 
automatically deactivates. Change the value if necessary. When the 
target table space has auto-resize enabled, set PCTDEACTIVATE to 100. 
Remove PCTDEACTIVATE for any specified System Managed (SMS) table 
spaces. 

If AUTOSTART is specified, the event monitor will automatically 
activate when the database activates. If MANUALSTART is specified 
instead, the event monitor must be explicitly activated through 
a SET EVENT MONITOR statement after the database is activated. 


-- To remind users how to use this file! 

ECHO ; 

ECHO ******* IMPORTANT ********** ; 

ECHO ; 

ECHO USAGE: db2 -tf wlmevmon.ddl ; 

ECHO ; 

ECHO ******* IMPORTANT ********** ; 

ECHO ; 

ECHO ; 


-- Set autocommit off 

UPDATE COMMAND OPTIONS USING C OFF; 


-- Define the activity event monitor named DB2ACTI VITI ES 


CREATE EVENT MONITOR DB2ACT I VITI ES 
FOR ACTIVITIES 
WRITE TO TABLE 

ACTIVITY (TABLE ACTIVITY_DB2ACTIVITIES 
IN TS_WLM_M0N 
PCTDEACTIVATE 100), 

ACTIVITYSTMT (TABLE ACTIVITYSTMT_DB2ACTIVITIES 
IN TS_WLM_M0N 
PCTDEACTIVATE 100), 

ACTIVITYVALS (TABLE ACTIVITYVALS_DB2ACTIVITIES 
IN TS_WLM_M0N 
PCTDEACTIVATE 100), 

CONTROL (TABLE C0NTR0L_DB2ACTIVITIES 
IN TS_WLM_M0N 
PCTDEACTIVATE 100) 

AUTOSTART; 
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Define the statistics event 


li tor named DB2STATISTICS 


CREATE EVENT MONITOR DB2STATISTICS 
FOR STATISTICS 
WRITE TO TABLE 

SCSTATS (TABLE SCSTATS_DB2STATISTICS 
IN TS_WLM_MON 
PCTDEACTIVATE 100), 

WCSTATS (TABLE WCSTATS_DB2STATISTICS 
IN TS_WLM_MON 
PCTDEACTIVATE 100), 

WLSTATS (TABLE WLSTATS_DB2STATISTICS 
IN TS_WLM_MON 
PCTDEACTIVATE 100), 

QSTATS (TABLE QSTATS_DB2STATISTICS 
IN TS_WLM_MON 
PCTDEACTIVATE 100), 

HISTOGRAMBIN (TABLE HIST0GRAMBIN_DB2STATISTICS 
IN TS_WLM_MON 
PCTDEACTIVATE 100), 

CONTROL (TABLE C0NTR0L_DB2STATISTICS 
IN TS_WLM_MON 
PCTDEACTIVATE 100) 

AUTOSTART; 


— Define the threshold violation event monitor named DB2THRESH0LDVI0LATI0NS 

CREATE EVENT MONITOR DB2THRESH0LDVI0LATI0NS 
FOR THRESHOLD VIOLATIONS 
WRITE TO TABLE 

THRESHOLDVIOLATIONS (TABLE THRESH0LDVI0LATI0NS_DB2THRESH0LDVI0LATI0NS 
IN TS_WLM_MON 
PCTDEACTIVATE 100), 

CONTROL (TABLE C0NTR0L_DB2THRESH0LDVI0LATI0NS 
IN TS_WLM_MON 
PCTDEACTIVATE 100) 

AUTOSTART; 


-- Commit work 


COMMIT WORK; 


Example B-10 shows the script to activate event monitors. 
Example B-10 06_start_evt_monitors.sql 


-- Script 06_start_evt_monitors.sql 
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— Thi : 


WLM monitors 




echo Monitor switches status ; 

SELECT substr(evmonname,l,30) as evmonname, 

CASE 

WHEN event_mon_state (evmonname) = 0 THEN 'Inactive' 
WHEN event_mon_state (evmonname) = 1 THEN 'Active' 
END as STATUS 

FROM syscat.eventmonitors ; 


set event monitor db2activities state 1 ; 
set event monitor db2stati sties state 1 ; 
set event monitor db2thresholdviolations state 1 ; 


echo Monitor switches status ; 

SELECT substr(evmonname,l,30) as evmonname, 

CASE 

WHEN event_mon_state (evmonname) = 0 THEN 'Inactive' 
WHEN event_mon_state (evmonname) = 1 THEN 'Active' 
END as STATUS 

FROM syscat.eventmonitors ; 


Example B-1 1 shows the script to test the work action set setting. 
Example B- 1 1 07_execs_by_subclasses.sql 


— Script 07_execs_by_subcl asses. sql 

-- This script will display existing superclasses and subclasses, 

-- and will execute some queries. 

-- These queries have increasing timeron cost, so the Work Ation Set 

-- This will send each of them to a particular service class. 


echo ============== Workloads executed by Subclasses ==== : 

SELECT 

VARCHAR( SERVICE_SUPERCLASS_NAME, 20) SUPERCLASS, 
VARCHAR( SERVICE_SUBCLASS_NAME, 20) SUBCLASS, 
C00RD_ACT_C0MPLETED_T0TAL 
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WHERE 


TABLE(WLM_GET_SERVICE_SUBCLASS_STATS_V97 AS T 


SERVICE_SUPERCLASS_NAME like 1 MAI N% ' 


echo executing queries... ; 

echo ===== query to be mapped to the TRIVIAL service subclass ===== ; 
select count(*) from MARTS. PRODUCT; 

echo ===== query to be mapped to the MINOR service subclass ===== ; 
select count(*) from MARTS. PRCHS_PRFL_ANLYSIS, MARTS. TIME; 

echo ===== query to be mapped to the EASY service subclass ===== ; 
select count(*) from MARTS. PRCHS_PRFL_ANLYSIS, MARTS. TIME, MARTS. STORE; 

echo ===== query to be mapped to the MEDIUM service subclass ===== ; 
select count_big(*) from MARTS. PRCHS_PRFL_ANLYS IS, MARTS. PRCHS_PRFL_ANLYS IS; 

echo ===== query to be mapped to the COMPLEX service subclass ===== ; 

— select count_big(*) from MARTS. PRODUCT, MARTS. Time, MARTS. Time; 

echo ============== Workloads executed by Subclasses =================== ; 

SELECT 

VARCHAR( SERVICE_SUPERCLASS_NAME, 20) SUPERCLASS, 

VARCHAR( SERVICE_SUBCLASS_NAME, 20) SUBCLASS, 
C00RD_ACT_C0MPLETED_T0TAL 

FROM 

TABLE(WLM_GET_SERVICE_SUBCLASS_STATS_V97( 1 ' , " , - 1) ) AS T 

WHERE 

SERVICE_SUPERCLASS_NAME like 1 MAI N% ' ; 


Example B-12 shows the script for verifying the ETL service class. 
Example B-1 2 08_etl_subclass.sql 


— Script 08_etl_subclass.sql 

-- This script creates a table and load data into it 


create table db2admin. PRODUCT like marts. product; 
declare mycursor cursor for select * from marts. product ; 
load from mycursor of cursor replace into db2admin. product ; 
drop table db2admin. product ; 


echo = ; 

echo ================== Executed workloads status 
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SELECT 


VARCHAR( SERVICE_SUPERCLASS_NAME, 30) SUPERCLASS, 
VARCHAR( SERVICE_SUBCLASS_NAME, 20) SUBCLASS, 
COORD_ACT_COMPLETED_TOTAL 

FROM 

TABLE(WLM_GET_SERVICE_SUBCLASS_STATS_V97(" , " , -1) ) AS T 

WHERE 

SERVICE_SUPERCLASS_NAME like 1 MAI N% ' ; 


Example B-13 and Example B-14 show the queries for verifying the concurrency 
workloads in an UNIX environment. 


Example B-13 

query minor.sql (for UNIX) 

select count(*) 

as Minor from MARTS. PRCHS_PRFL_ANLYSIS, MARTS. TIME ; 

Example B-14 

query_easy.sql (for UNIX) 

select count(*) 

as Easy from MARTS. PRCHS_PRFL_ANLYSIS, MARTS. TIME, MARTS. STORE ; 


Example B-15 shows the script for running the queries for verifying the 
concurrency workloads on an UNIX environment. Replace db_name with your 
database name. 

Example B-15 09_conc_exec_Unix.sh 

db2batch -d dbjame -f query_minor.sql -a db2admin/ibm2blue -time off & 

db2batch -d dbjiame -f query_minor.sql -a db2admin/ibm2blue -time off & 

db2batch -d dbjiame -f query_minor.sql -a db2admin/ibm2blue -time off & 

db2batch -d dbjiame -f query _mi nor. sql -a db2admin/ibm2blue -time off & 

db2batch -d dbjame -f query_easy.sql -a db2admin/ibm2blue -time off & 

db2batch -d dbjame -f query_easy.sql -a db2admin/ibm2blue -time off & 

db2batch -d dbjame -f query_easy.sql -a db2admin/ibm2blue -time off & 


Example B-16 shows the script for checking concurrency on an UNIX 
environment. 

Example B-16 1 0_conc_check.sql 

echo = ; 

echo ===== Highest number of concurrent workload occurrences ===== ; 
echo ===== (since last reset) ===== ; 

SELECT C0NCURRENT_WL0_T0P, 

SUBSTR (W0RKL0AD_NAME, 1 ,25) AS W0RKL0AD_NAME 
FROM TABLE(WLM_GET_SERVICE_SUBCLASS_STATS_V97(",",-1)) AS T 
WHERE DBPARTITIONNUM = 0 
ORDER BY W0RKL0AD_NAME ; 
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Workloads executed by Subclasses 


SELECT 

VARCHAR( SERVICE_SUPERCLASS_NAME, 27) SUPERCLASS, 
VARCHAR( SERVICE_SUBCLASS_NAME, 18) SUBCLASS, 
C00RD_ACT_C0MPLETED_T0TAL as NUMBER_EXECS, 
CONCURRENT_ACT_TOP as C0NC_HWM 

FROM 

TABLE(WLM_GET_SERVICE_SUBCLASS_STATS_V97(", ",-!)) AS T 


—WHERE 

SERVICE_SUPERCLASS_NAME like 1 MAI N% 1 


Example B-17 and Example B-18 show the queries for verifying the concurrency 
workloads in a Windows environment. 

Example B-17 query_medium. txt (for Windows) 

connect to sample2 USER user4 USING password; 
set schema schema_name ; 

select count(*) as medium from empmdc, empmdc ; 


Example B-18 query_easy. txt (for Windows) 

connect to sample2 USER USER4 USING password; 
set schema schema name ; 

Select count(*) as easy from empmdc, staff, staff ; 


Example B-1 9 shows the 09a_conc_exec_Wi n . bat script to run the queries for the 
concurrency test on a Windows environment. Use the same script to see the 
results. 

Example B-1 9 Script 09a_conc_exec_Win.bat 

REM 09a_conc_exec_Win.bat 
REM Starts 4 concurrent medium workloads 
db2cmd -c db2 -tf query_medium.txt 
db2cmd -c db2 -tf query_medium.txt 
db2cmd -c db2 -tf query_medium.txt 
db2cmd -c db2 -tf query_medium.txt 

REM Starts 3 concurrent easy workloads 
db2cmd -c db2 -tf query_easy.txt 
db2cmd -c db2 -tf query_easy.txt 
db2cmd -c db2 -tf query_easy.txt 


31 2 IBM Smart Analytics System 



Example B-20 shows the commands to create timeout thresholds for service 
subclasses. 

Example B-20 Script 1 1_create_timeout_thresholds 

-- Script ll_create_timeout_threshold 

-- This script creates elapsed time thresholds for service subclasses 


-- Create threshold for TRIVIAL subclass 

CREATE THRESHOLD TH_TIME_SC_TRIVIAL FOR SERVICE CLASS TRIVIAL UNDER MAIN ACTIVITIES 

ENFORCEMENT DATABASE ENABLE 

WHEN ACTIVITYTOTALTIME > 1 MINUTE 

COLLECT ACTIVITY DATA on COORDINATOR WITH DETAILS CONTINUE ; 


-- Create threshold for MINOR subclass --- 

CREATE THRESHOLD TH_TIME_SC_MINOR FOR SERVICE CLASS MINOR UNDER MAIN ACTIVITIES 

ENFORCEMENT DATABASE ENABLE 

WHEN ACTIVITYTOTALTIME > 5 MINUTES 

COLLECT ACTIVITY DATA on COORDINATOR WITH DETAILS CONTINUE ; 


-- Create threshold for SIMPLE subclass 

CREATE THRESHOLD TH_TIME_SC_SIMPLE FOR SERVICE CLASS SIMPLE UNDER MAIN ACTIVITIES 

ENFORCEMENT DATABASE ENABLE 

WHEN ACTIVITYTOTALTIME > 30 MINUTES 

COLLECT ACTIVITY DATA on COORDINATOR WITH DETAILS CONTINUE ; 


-- Create threshold for MEDIUM subclass 

CREATE THRESHOLD TH_TIME_SC_MEDIUM FOR SERVICE CLASS MEDIUM UNDER MAIN ACTIVITIES 

ENFORCEMENT DATABASE ENABLE 

WHEN ACTIVITYTOTALTIME > 60 MINUTES 

COLLECT ACTIVITY DATA on COORDINATOR WITH DETAILS CONTINUE ; 


-- Create threshold for COMPLEX subclass 


-- elapsed time: 

CREATE THRESHOLD TH_TIME_SC_COMPLEX FOR SERVICE CLASS COMPLEX UNDER MAIN ACTIVITIES 

ENFORCEMENT DATABASE ENABLE 

WHEN ACTIVITYTOTALTIME > 240 MINUTES 

COLLECT ACTIVITY DATA on COORDINATOR WITH DETAILS CONTINUE ; 
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Example B-21 is an example of how to check concurrency in the event monitor 
tables. 

Example B-21 Script 12_subclass_concurrency.sql 


— Script 12_subclass_concurrency.sql 

-- This script queries the wlm statistic tables to show the 
-- number of cocurrent query execution by subclass and by time 

-- Change the timestamps below to the desired period 


SELECT 

concurrent_act_top, 

varchar(service_subclass_name,20) as subclass, 
varchar(service_superclass_name,30) as superclass, 
stati sti cs_timestamp 

FROM 

scstats_db2stati sti cs 

WHERE 

date(statistics_timestamp) = current date 
stati sti cs_timestamp between 
'2010-11-15-15.00.00' and 
'2010-11-15-15.30.00' 


In Example B-22, 13_alter_default_workload is the script to start collecting 
statistics in the default workload. 

Example B-22 Script 1 3_alter_default_workload 

— Script 13_alter_default_workload.sql 

-- This script starts collecting default workload statistics 

alter workload sysdefaultuserworkload collect activity data on coordinator with details 

Example B-23 shows the script to obtain data stored in the statistics tables. 
Example B-23 14_dftwkload_statements.sql 
-- Script 14_dftwkload_statements.sql 

-- This script selects the statements captured at the default workload 
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-- (e.workload_id = 1, which is the defaultuserworkload) 

-- along with some other details, like: user, application, date, time, 

-- superclass and subclass for the current day. 

SELECTvarchar(session_auth_id,15) as user_name, 
varchar(appl_name,10) as appl_name, 
varchar(workloadname,25) as workload_name, 
varchar(service_superclass_name,10) as superclass, 
varchar(service_subclass_name,10) as subclass, 
date(time_started) as date, 
time(time_started) as time, 
varchar(stmt_text, 150) as statement_text 
FROM ACTIVITYSTMT_DB2ACTIVITIES s, ACTIVITY_DB2ACTIVITIES e, syscat. workloads w 
WHEREs.activity_id = e.activity_id 
AND s.appl_id = e.appl_id 
ANDs.uowJd = e.uowjd 
AND e.workload_id = 1 
AND e.workload_id = w.workloadid 

uncomment next row to obtain queries captured today 

AND date(e.time_started) = date (current timestamp) 

or adjust date and uncomment next row to obtain queries captured at selected day 

-- and date(e.time_started) = date ('11/02/2010') 

FETCH first 50 rows only ; 


B.3 Tuned DB2 workload manager configuration 

These scripts are used for the tuned DB2 workload manager environment 
exercise. 

Example B-24 shows the script to create DB2 roles. 

Example B-24 Script 50_create_roles.sql 

— Script 50_create_roles.sql 
-- This script create DB2 roles. 

-- The idea is to create groups of similar users into one of the roles 


CREATE ROLE Adhoc ; 

GRANT ROLE Adhoc TO USER userl ; 
GRANT ROLE Adhoc TO USER user2 ; 
GRANT ROLE Adhoc TO USER user3 ; 


CREATE ROLE DBAs ; 

GRANT ROLE DBAs TO USER user4 ; 
GRANT ROLE DBAs TO USER user5 ; 
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CREATE ROLE PWRUSR ; 

GRANT ROLE DBAs TO USER user6 ; 
GRANT ROLE DBAs TO USER user7 ; 


CREATE ROLE GUEST ; 

GRANT ROLE DBAs TO USER user8 ; 
GRANT ROLE DBAs TO USER user9 ; 


Example B-25 shows the script to create DB2 workload manager workload 
objects. 

Example B-25 51_create_workloads.sql 

— Script 51_create_workloads.sql 

-- This script creates DB2 WLM workloads 


--alter workload wl disable ; 
--drop workload wl ; 

CREATE WORKLOAD Wl 

SESSIONJJSER ROLE ('DBAS') 
SERVICE CLASS MAIN 
POSITION AT 1; 

commi t ; 


WORKLOAD Wl to public ; 


CREATE WORKLOAD W2 

SESSIONJJSER ROLE ('ADHOC', 'PWRUSR') 
SERVICE CLASS MAIN 
POSITION AT 2; 


WORKLOAD W2 




CREATE WORKLOAD W3 

SESSIONJJSER ROLE ('GUEST') 
SERVICE CLASS MAIN 
POSITION AT 3; 

commit; 


GRANT USAGE on WORKLOAD W3 to public ; 
commit; 
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Example B-26 shows the script for altering the defined thresholds. 
Example B-26 Script 52_enforce_ thresholds, sql 
-- 52_enforce_thresholds 


-- This script alter the thresholds defined in WLM configuration phase 1 

For concurrency thresholds, queries exceeding the limit will be 
put on a queue. 

For timeout thresholds, queries exceeding the limit will be 
terminated. 

-- Create threshold for TRIVIAL subclass — 

ALTER THRESHOLD TH_TIME_SC_TRIVIAL 
WHEN ACTIVITYTOTALTIME > 1 MINUTE 

COLLECT ACTIVITY DATA on COORDINATOR WITH DETAILS STOP EXECUTION ; 


-- Create threshold for MINOR subclass 

ALTER THRESHOLD TH_TIME_SC_MINOR 
WHEN ACTIVITYTOTALTIME > 5 MINUTES 

COLLECT ACTIVITY DATA on COORDINATOR WITH DETAILS STOP EXECUTION ; 


-- Create threshold for SIMPLE subclass *&**>■•#* 

ALTER THRESHOLD TH_TIME_SC_SIMPLE 
WHEN ACTIVITYTOTALTIME > 30 MINUTES 

COLLECT ACTIVITY DATA on COORDINATOR WITH DETAILS STOP EXECUTION ; 


-- Create threshold for MEDIUM subclass 

ALTER THRESHOLD TH_T I M E_SC_M ED I UM 
WHEN ACTIVITYTOTALTIME > 60 MINUTES 

COLLECT ACTIVITY DATA on COORDINATOR WITH DETAILS STOP EXECUTION ; 


-- Create threshold for COMPLEX subclass - 

— elapsed time: 

ALTER THRESHOLD TH_TIME_SC_COMPLEX 
WHEN ACTIVITYTOTALTIME > 240 MINUTES 

COLLECT ACTIVITY DATA on COORDINATOR WITH DETAILS STOP EXECUTION ; 


-- Lists the existing thresholds and corresponding types 
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select varchar(THRESH0LDNAME,25) as Threshol d_name, varchar(THRESH0LDPREDICATE,25) as 
Threshold_Type, maxvalue from syscat. thresholds ; 


Example B-27 shows the script to change the work class set definitions. 
Example B-27 53_alter_workclasses.sql 

— Script 53_alter_workcl asses. sql 

-- This script changes the Work Class Set definitions 
ALTER WORK CLASS SET "W0RK_CLASS_SET_1" 

— ALTER WORK CLASS "WCLASS_TRIVIAL" FOR TIMERONCOST FROM 0 to 5000 POSITION 

AT 1 

— ALTER WORK CLASS "WCLASS_MIN0R" FOR TIMERONCOST FROM 5000 to 30000 POSITION 

AT 2 

ALTER WORK CLASS "WCLASS_SIMPLE" FOR TIMERONCOST FROM 30000 to 400000 POSITION 

AT 3 

ALTER WORK CLASS "WCLASS_MEDIUM" FOR TIMERONCOST FROM 400000 to 5000000 POSITION 

AT 4 

— ALTER WORK CLASS "WCLASS_COMPLEX" FOR TIMERONCOST FROM 5000000 to UNBOUNDED 
POSITION AT 5 ; 

COMMIT ; 


31 8 IBM Smart Analytics System 



Related publications 


The publications listed in this section are considered particularly suitable for a 
more detailed discussion of the topics covered in this book. 


IBM Redbooks publications 

The following IBM Redbooks publication provides additional information about 
the topic in this document. Note that publications referenced in this list might be 
available in softcopy only. 

► DB2 Performance Expert for Multiplatforms V2.2, SG24-6470 

You can search for, view, or download Redbooks publications, Redpaper 
publications, Technotes, draft publications, and Additional materials, as well as 
order hardcopy Redbooks publications, at this website: 
ibm.com/redbooks 


Online resources 

These websites are also relevant as further information sources: 

► IBM Smart Analytics System: 

http://www.ibm.com/software/data/infosphere/smart-analytics-system/ 

► Database Management: 

http : //www . i bm . com/software/data/management/ 

► DB2 9.7 Manuals: 

http://wwwl.ibm.com/support/docview.wss?rs=71&uid=swg27015148 

► DB2 9.7 Features and benefits: 

http : //www-01 . i bm. com/software/data/db2/9/features . html 
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Help from IBM 


IBM Support and downloads: 
ibm.com/support 

IBM Global Services: 

ibm.com/services 
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